Hacker News new | past | comments | ask | show | jobs | submit login
Fitting an elephant with four non-zero parameters (arxiv.org)
307 points by belter 67 days ago | hide | past | favorite | 147 comments



I love the ironic side of the article. Perhaps they should add the reason for it, from Fermi's and Neumann's. When you are building a model of reality in Physics, If something doesn’t fit the experiments, you can’t just add a parameter (or more) variate it and fit the data. The model should have zero parameters, ideally, or the least possible, or, even at a more deeper level, the parameters should emerge naturally from some simple assumptions. With 4 parameters you don’t know whether you are really capturing a true aspect of reality of just fitting the data of some experiment.


This was mentioned in the first paragraph of the paper. The paper is mostly humoristic.

That said, the wisdom of the quip has been widely lost in many fields. In many fields data is "modeled" with huge regression models with dozens of parameters or even neural networks with billions of parameters.

> In 1953, Enrico Fermi criticized Dyson’s model by quoting Johnny von Neumann: “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.”[1]. This quote is intended to tell Dyson that while his model may appear complex and precise, merely increasing the number of parameters to fit the data does not necessarily imply that the model has real physical significance.


> > In 1953, Enrico Fermi criticized Dyson’s model by quoting Johnny von Neumann: “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.”

For those who are interested, you can watch Freeman Dyson recount this conversation in his own words in an interview: https://youtu.be/hV41QEKiMlM


That's how I feel about dark matter. Oh this galaxy is slower than this other similar one. The first one must have less dark matter then.

What can't be fit by declaring the amount of dark matter that must be present fits the data? It's unfalsifiable, just because we haven't found it, doesn't mean it doesn't exist. Even worse than string/M-theory which at least has math.


The dark matter theory is falsifiable. Sure we can't see dark matter (it doesn't interact electromagnetically), but we can see its effects, and it has to follow the laws of physics as we understand them today.

It is actually a satisfying theory with regard to the Occam razor. We don't have to change our laws of physics to explain the abnormal rotations of galaxy, we just need "stuff" that we can't see and yet interact gravitationally. When we have stuff like neutrinos, it is not that far fetched. In fact, though unlikely given our current understanding of physics, dark matter could be neutrinos.

If, as it turn out, the invisible stuff we call dark matter doesn't follow the laws of physics as we know them, then the dark matter theory is falsified and we need a new one (or at least some tweaks). And it may actually be the case as a recent paper claims that gravitational lensing doesn't match the predictions of the dark matter theory.

The main competitor to dark matter is modified gravity, which calls for no new stuff, but changes the equations for gravity. For the Occam razor, adding some random term to an equation is not really better than adding some invisible but well characterized stuff, especially when we consider that the equation in question is extremely well tested. It is, of course, also falsifiable.

The problem right now is not that these theories are unfalsifiable, it is that they are already pretty much falsified in their current form (dark matter less than modified gravity), and some rework is needed.


'dark matter' is not a theory, it is the name of an observational problem.

There are many theories to explain dark matter observations. MOND is not a competitor with 'dark matter', because MOND is a theory and it tries to explain some aspects (spiral galaxy rotation) of what is observed as the dark matter problem, which consists of many more observations. There is no competition here. There are other theories to explain dark matter, like dark matter particle theories involving neutrinos or whatever, and these may be called competitors, but dark matter itself is not a theory, but a problem statement.


Yes and no...MOND's core proposition is that dark matter doesn't exist, and instead modified gravity does.

Whereas you can have many proposals for what dark matter is, provided it is capable of being almost entirely only gravitationally interacting, and there's enough of it.

MOND has had the problem that depending which MOND you're talking about, it still doesn't explain all the dark matter (so now you're pulling free parameters on top of free parameters).


I've seen this opinion before, but can't seem to square it with any reliable source.

Wikipedia has: "dark matter is a hypothetical form of matter that appears not to interact with light or the electromagnetic field ... Although the astrophysics community generally accepts dark matter's existence, a minority of astrophysicists, intrigued by specific observations that are not well-explained by ordinary dark matter, argue for various modifications of the standard laws of general relativity. These include modified Newtonian dynamics, tensor–vector–scalar gravity, or entropic gravity."

Even its name, "dark matter", sort of strongly implies this. If someone were just trying to refer to the observations, rather than a specific explanation for the observations, wouldn't they just say "abnormal galaxy rotation curves" rather than "dark matter"?

I'm not saying Wikipedia is an end-all-be-all source on this, I'm just asking where you're getting this alternate definition. If it is somewhere reliable then perhaps the article needs to be rephrased.


As far as I know it's from here: https://www.youtube.com/watch?v=PbmJkMhmrVI As I said elsewhere in this thread, I think she's trying to make some meta point but ends up just muddying the water.


> For the Occam razor, adding some random term to an equation is not really better than adding some invisible but well characterized stuff...

You're being too kind. It's worse. Especially when (in my understanding anyway) that added term doesn't even explain all the things dark matter does.


Adding any finite number of parameters is strictly better than adding an infinity of parameters (i.e. an arbitrary distribution of dark matter chosen to match the observations).


The distribution has to be consistent forward and backwards in time. It's a lot less arbitrary than you're implying, and adding a hundred parameters (or similar finite number) to gravity is not better.


If we add an arbitrary amount of dark matter everywhere, to match the observed motions of the celestial bodies, that adds an infinity of parameters, and not even a enumerable one.

This obviously can match almost anything and it has extremely low predictive power (many future observations may differ from predictions, which can be accounted by some dark matter whose distribution was previously unknown), so it is a much worse explanation than a modified theory of gravity that would have only a finite number of additional parameters.


the reason this isn't true is that by the hypothesis of dark matter, it follows gravity but not electromagnetism. as such it only fits distributions recoverable from evolving gravity. e.g. if we require a certain distribution today, it fixes the distribution at all other points in time, and we can use light speed delay to look into the past to verify whether the distributions have evolved according to gravity.


All observations of individual galaxies occur at a specific point in time. We can’t use light speed delay to see the evolution of individual galaxies only completely different galaxies at some other point in time. As such each galaxy gets its own value for the amount of dark matter.

At minimum this is a ~200 billion parameter model, and more if you’re looking at smaller structures.


That's equally true of the distribution of baryonic matter. We have to assess each galaxy individually to figure out what it's made of? What a crime against science. Never mind that they're still all made of a small handful of types of parts, which can nevertheless combine to form lots of possible histories and shapes for individual objects. Just like literally everything else in the observable universe. Seriously, what part of this argument is different for computing the amount of visible mass in each galaxy?


The availability to detect visible light from stars or detect that light being blocked by baryonic matter.

With dark matter it’s two steps removed where we’re inferring the behavior of baryonic matter and then inferring the amount of baryonic matter we aren’t observing and then calculating the existence of dark matter to get that behavior after accounting for undetected baryonic matter.


Yeah, that's a pain, but calculating mass from photons is still pretty indirect. More importantly, and independently of "directness", no one pretends that galaxies having different masses introduces two billion parameters into our models of cosmology. Because that's not what a model of cosmology is.


Calculating the percentage of the universe’s observable mass is dark matter adds 200+ billion parameters because the mass fraction of each galaxy varies.

So there’s no simple way to calculate it from say looking at the Milky Way alone and extrapolating from the baryonic mass of the rest of the universe. Trying to approximate things from a representative sample is its own problem.


You're still confusing a physics model with a map of the universe. That said, it's sure a heck of a coincidence that the number they get from adding up estimated dark matter in galaxies lines up with the number they get from other cosmological measurements, isn't it? Almost like galaxy rotation curves aren't the only evidence for dark matter and haven't been for a long time. https://en.wikipedia.org/wiki/Dark_matter#Observational_evid...


Gravitational lensing, velocity dispersions, etc circles back to the total mass of galaxies. So it shows up on many of the ways we calculate the total mass fraction not just rotation anomalies.


Many of the ways? I guess that's it then, there's no reason to look at the whole picture, and especially no point in reading all the way to https://en.wikipedia.org/wiki/Dark_matter#Cosmic_microwave_b...


Your being redundant, Many in this instance obviously implied not every.


Then, in what way is it relevant to my claims here? Namely,

1. Dark matter does not meaningfully introduce billions more parameters into cosmological models than they already have, and

2. Individual galaxies' dark matter fractions are not essential to (not proving, but) strongly suggesting dark matter exists.


Something being consistent with a model is different than something being sufficient evidence on its own to support a model.

If the observed dark matter fractions of all known galaxies were 0% but the CMB was unchanged we wouldn’t assume dark matter exists. Thus your #2 is false. There’s infinite models consistent with any observation so finding something after a model was created for other reasons is useful as validation, but the chain of logic is still dependent on the prior observations not the model.

In a meaningfully different cosmos different observations would have happened and different models would exist. Trying to pick out specific experiments as sufficient on their own glosses over that particular limitation.


> If the observed dark matter fractions of all known galaxies were 0% but the CMB was unchanged we wouldn’t assume dark matter exists.

No, astrophysicists would eventually figure out something was up when they couldn't replicate the actual spectrum with dark-matter-free simulations. Why would you assume otherwise? Unless you want to dig into the assumptions of the scenario, in which case you're probably proposing a self-inconsistent universe so of course you can draw whatever conclusions you want from it.

> There’s infinite models consistent with any observation...

You can't actually believe this and still believe in science. If observations don't constrain models, then there is no point in observing. And in the long run, there's asymptotically no difference between "prior observations" and later observations. They're just observations that all go into the same model-constraining mill. Scientists are not fools, and are capable of realizing when an initial observation put them on a wrong trail.

You're still barely touching the real point. This all just sounds like rationalizations to avoid the fact that dark matter, for now at least, and for all that it genuinely sucks, is the Occam's razor explanation for the full suite of observations. Why is this so hard to accept?


> to match the observed motions of the celestial bodies

The point is that even with current observational data there's no reasonable distribution of dark matter that correctly explains all evidence that we have.

Your intuition that "if I have an infinite number of degrees of freedom anything at all can be fit" is leading you astray here.


> Sure we can't see dark matter (it doesn't interact electromagnetically), but we can see its effects

Even this is granting too much: "seeing it" and "seeing its effects" are the same thing. No one has ever "directly seen", in the sense that internet DM skepticism demands, anything other than a photon.


"Seeing" is indeed a poorly chosen word.

The problem with dark matter is that there does not exist any second relationship from which to verify its existence, like in the case of normal matter, which takes part in a variety of interactions that lead to measurable effects, which can be compared.

The amount and the location of dark matter is computed from the gravitational forces that explain the observed movements of the bodies, but there are no additional relationships with any other data, which could corroborate the computed distribution of dark matter. That is what some people mean by "seeing".


All major DM candidates also have multiple interactions: that's the WI in WIMP, for instance. In fact I don't know that anyone is seriously proposing that dark matter is just bare mass with no other properties - aside from the practical problems, that would be a pretty radical departure from the last century of particle physics.


No interactions have been found, despite a lot of resources put into the search. So currently all dark matter particle theories apart from "non-interacting" have been falsified. And non-interacting theories are probably unfalsifiable.

Radical departure may well be needed, for other reasons too.


> The problem with dark matter is that there does not exist any second relationship from which to verify its existence.

This is exactly it! Dark matter is strictly defined by its effects. The only 'theory' part is a belief that it's caused by yet to be found particle that's distributed to fit observations. Take all the gravitational anomalies that we can't explain with ordinary matter, then arbitrarily distribute an imaginary 'particle' that solves them: that's DM.

The problem is that the language used to talk about DM is wrong. It's not that DM doesn't interact with EM, or the presence of DM is causing the galaxies to rotate faster than by observed mass. These are all putting the cart before the horse. What we have is unexplained gravitational effects being attributed to a hypothetical particle. If we discovered a new unexplained gravitational property, we would merely add that to the list of DM's attributes rather than say "oh then it can't be DM".


> Dark matter is strictly defined by its effects

All physical entities are defined by their effects! Suppose we found axions and they had the right mass to be dark matter. Would that mean we now "really knew" what dark matter was, in your sense? No, it would just push the defining effects further back - because all an axion is is a quantum of the (strong CP-violation term promoted to a field).

Just like the electromagnetic field is the one that acts on charged particles in such and such a way, and a particle is charged if the electromagnetic field acts on it in that way. There's no deeper essence, no intuitive "substance" with some sort of intrinsic nature. All physical properties are relational.


I used to think this, but dark matter does make useful predictions, that are hard to explain otherwise.

This is partially because there are two ways to detect dark-matter. The first is gravitational lensing. The second is the rotatinal speed of galaxies. There are some galaxies that need less Dark Matter to explain their rotational speed. We can then cross check whether those galaxies cause less gravitational lensing.

Besides that, the gravitational lensing of galaxies being stronger than the bright matter in the galaxies can justify is hard to explain without dark matter.


The problem with dark matter is that there's no (working) theory on how the dark matter is distributed. It's really easy to "explain" gravitational effects if you can postulate extra mass ad-hoc to fit the observations.


I dunno if this is the correct way of thinking about it, but I just imagine it as a particle that has mass but does not interact with other particles (except at big-bang like energy levels?). So essentially a galaxy would be full of these particles zipping around never colliding with anything. And over time, some/most of these particles would have stable orbits (as the ones in unstable orbits would have flown off by now) around the galactic core. And to an observer, it would look like a gravitational tractor ahead of the rest of the physical mass of the galaxy (which is slower because it is affected by things like friction and collisions?). And so you'd see galaxies where the arms are spinning faster than they should be?


> I dunno if this is the correct way of thinking about it, but I just imagine it as a particle that has mass but does not interact with other particles (except at big-bang like energy levels?).

Not even anything that extreme. What's ruled out is interaction via electromagnetism (or if you want to get really nit-picky, electromagnetic interaction with a strength above some extremely low threshold).


If there are two different types of observations, and one parameter can explain both, that is pretty strong evidence. Put differently, dark matter is falsifyable, and experiments have tried to falsify it without success.

Besides the idea 'not all mass can be seen optically' is not that surprising. The many theories on what that mass might be are all speculation, but they are treated as such.


It's worth noting that one dark matter explanation is just: it's cold matter we just can't see through telescopes. Or black holes without accretion disks.

Both of these are pretty much ruled out though: you can't plausibly add enough brown dwarfs, and if it's black holes then you should see more lensing events towards nearby stars given how many you'd need.

But they're both concrete predictions which are falsifiable (or boundable such that they can't be the dominant contributors).


Dark matter is constrained by, among other things, dynamical simulations. For instance, here's an example of reproducing real world observations, that previously didn't have great explanations, using simulations with dark matter: https://www.youtube.com/live/8rok8E_tz8k?si=Q7vmQYpZr_6K7--m. And that's not even getting into the cosmology that has to (and mostly does) fit together.


Interesting that you should link that video. Its title card says "Angela Collier". Here's a more recent video by the physicist[0].

Re: where it says "using simulations with dark matter", we can't simulate DM because it doesn't have any properties beyond our observations. All we do is distribute amounts of it to match observations. It could be "Dyson spheres with EM shields" and the results would be the same.

[0] https://www.youtube.com/watch?v=PbmJkMhmrVI


Yes, and I think that video is stupid. She doesn't use the term that way in her own talk, and neither does any scientist I've ever heard. I think she's trying to make some abstract point about science in general and muddying the water in the process. Her takes on terminology are often bad IMO.

That doesn't take away the fact that when you work with the slightly more specific theory of "particle dark matter" it produces real results. And I believe there's a lot more work over the years in similar areas. It doesn't get talked about because it's not sexy, so people who only follow cosmology when there's drama don't hear about it. That was just the example at the top of my mind because I'd seen it recently, and the result is really quite spectacular. Did you watch it through?


> What can't be fit by declaring the amount of dark matter that must be present fits the data?

Tons of things - just like there are tons of things that can't be fit by declaring the amount of electromagnetically-interacting matter that must be present fits the data.

You can fit anything you like by positing new and more complicated laws of physics, but that's not what's going on here. Dark matter is ordinary mass gravitating in an ordinary way: the observed gravitational lensing needs to match up with the rotation curves needs to match up with the velocity distributions of galaxies in clusters; you don't strictly need large scale homogeneity and isotropy but you really really want it, etc. Lambda-CDM doesn't handle everything perfectly (which in itself demonstrates that it's not mindless overfitting) but neither does anything else.


You also have to do other things like not break General Relativity.

Which MOND does: it creates huge problems fitting into GR.

Whereas dark matter as just regular mass that interacts poorly by other means does not.


There are modified gravity theories that are compatible/extensions to GR, e.g the f(R) gravity theories.

Nobody probably believes MOND as such is some fundamental theory, rather as a "theory" it's sort of a stepping stone. Also MOND is used often interchangeably (and confusingly) with modified gravity theories in general.


> Dark matter is ordinary mass gravitating in an ordinary way: the observed gravitational lensing needs to match up with the rotation curves needs to match up with the velocity distributions of galaxies in clusters

Those are all the same thing, the shape of spacetime. The only thing DM adds is a backstory that this shaping comes from hypothetical undiscovered particles with properties that match observations.


It's easy to say "Epicycles! Epicycles!", but people are going to continue using their epicycles until a Copernicus comes along.


Well, the funny thing is Copernicus posits just about as many epicycles in his theory as previous geocentric theories. Only Kepler’s discovery of the equal area law and elliptical orbits successfully banishes epicycles.


The history of these discoveries is fascinating and shows that Kuhn’s scientific revolutions idea is wrong but it’s always rounded off to “Copernicus and Galileo” and doesn’t even get them right


There will be no Copernicus if everybody just studies epicycles. E.g. there are massive resources put into the desperate WIMP hunt that could be used for finding new theories.


I don't see how those resources are fungible with each other.


Research funding is very competitive and scarce.


Building a physics machine, and thinking about how the equations might work, are so very different.

Does the latter even get funding?


Of course. It's called theoretical physics.


I think a model with zero parameters belongs more to math, because it can be derived from first principles. E.g. The surface area of a sphere is 4/3 * pi * r^3, assuming Euclidean space. Physics begins when we have at least one constant of nature to measure, like the actual curvature of space or the attraction due to gravity.


Hmm..

Hodgin and Huxley did ground-breaking work on squid's giant axon and modelled neural activity. They had multiple parameters extracted from 'curve fitting' of recorded potential and injected currents which were much later mapped to sodium channels. Similarly, another process to potassium channels.

I woudnt worry too much having multiple parameters -- even four when 3 can't just explain the model.


Neuron anatomy is the product of hundreds of millions of years of brute contingency. There are reasons why it can't be certain ways (organisms that were that way [would have] died or failed to reproduce) but no reason whatsoever why it had to be exactly this way. It didn't, there are plenty of other ways that nerves could have worked, this is just the way they actually do.

The physics equivalent is something like eternal inflation as an explanation for apparent fine-tuning - except that even if it's correct it's still absolutely nowhere near as complex or as contingent as biology.


The balance between empirical data fitting and genuine understanding of the underlying reality


Notably done for the first time irl in "Least square fitting of an elephant", James Wei (1975) Chemtech



Isn't the form of an equation really just another sort of parameter?


This is why I think that modeling elementary physics is nothing else than fitting data. We might end up with something that we perceive as "simple", or not. But in any case all the fitting has been hidden in the process of ruling out models. It's just that a lot of the fitting process is (implicitly) being done by theorists; we come up with new models and that are then being falsified.

For example, how many parameters does the Standard Model have? It's not clear what you count as a parameter. Do you count the group structure, the other mathematical structure that has been "fitted" through decades of comparisons with experiments?


You are using the word "fitting" rather loosely. We usually "fit" models of fixed function form and fixed number of parameters.

You are also glossing over centuries of precedent that predate high-energy physics, namely quantum field theory, special relativity, and foundational principles such as conservation of energy and momentum.


It tends to be a parameter that can be derived from rrasoning and assumptions. This contrasts to free parameters where you say "and we have no idea what this value should be, so we'll measure it"


Yes, it is.

Which makes the only truly zero parameter system the collection of all systems, in all forms.


Kolmogorov complexity[0] solves this loophole :)

[0] https://en.wikipedia.org/wiki/Kolmogorov_complexity


Kolmogorov complexity is an effort to wrangle it. It's impossible to fully solve.

You can change measured complexity by altering the baseline assumptions.


What do you mean by baseline assumptions? The low level computer instructions?


This is humorous (and well-written), but I think its more than that.

I'm always making the joke (observation) that ML (AI) is just curve-fitting. Whether "just curve-fitting" is enough to produce something "intelligent" is, IMO, currently unanswered, largely due to differing viewpoints on the meaning of "intelligent".

In this case they're demonstrating some very clean, easy-to-understand curve-fitting, but it's really the same process -- come up with a target, optimize over a loss function, and hope that it generalizes, (this one, obviously, does not. But the elephant is cute.)

This raises the question Neumann was asking -- why have so many parameters? Ironically (or maybe just interestingly), we've done a lot with a ton of parameters recently, answering it with "well, with a lot of parameters you can do cool things".


> Whether "just curve fitting" is enough to produce something "intelligent" is, IMO, currently unanswered

Continual "curve fitting" to the real world can create intelligence. What is missing is not something inside the model. It's missing a mechanism to explore, search and expand its experience.

Our current crop of LLMs ride on human experience, they have not largely participated in creating their own experiences. That's why people call it imitation learning or parroting. But once models become more agentic they can start creating useful experiences on their own. AlphaZero did it.


There are a whole bunch of assumptions here. But sure, if you view the world as a closed system, then you have a decision as a function of inputs:

1. The world around you 2. The experiences within your (really, the past view of the world around you) 3. Innateness of you (sure, this could be 2 but I think it's also something else) 4. The experience you find + the way you change yourself to impact (1), (2), and (3)

If you think of intelligence as all of these, then you're making the assumption that all that's required for (2), (3), and (4) is "agentic systems", which I think skips a few steps (as the author of an agent framework myself...). All this is to say that "what makes intelligence" is largely unsolved, and nobody really knows, because we actually don't understand this ourselves.


> Continual "curve fitting" to the real world can create intelligence.

I'm going to need a citation on this bold claim. And by that I mean in the same vein as what Carl Sagan would say

  Extraordinary claims require extraordinary evidence


I'd simply argue what we do is precisley that, so either we are intelligent, or we are not, however we might define that intelligence.


AlphaZero did not create any experiences. AlphaZero was software written by people to play board games and that's all it ever did.


Are you launching into a semantic argument about the word 'experience'? If so, it might help to state what essential properties alphago was missing that makes it 'not having an experience'.

Otherwise this can quickly devolve into the common useless semantic discussion.


Just making sure no one is confused by common computationalist sophistry and how they attribute personal characteristics to computers and software. People can have and can create experiences, computers can only execute their programmed instructions.


Username `soist is an abbreviation for `solipsist, then?


On what priors are you making that statement?


Rephrase your question. I don't know what you're asking.


I think he meant to ask, what is the difference between an experience and a predefined instruction?


AZ trained in self-play mode for millions of games, over multiple generations of a player pool.


I am familiar with the literature on reinforcement learning.


They're saying the board games AlphaZero played with itself are experiences.


And I am saying they are confused because they are attributing personal characteristics to computers and software. By spelling out what computers are doing it becomes very obvious that there is nothing that can be aware of any experiences in computers as it is all simply a sequence of arithmetic operations. If you can explain which sequence of arithmetic operations corresponds to "experiences" in computers then you might be less confused than all the people who keep claiming computers can think and feel.


> By spelling out what computers are doing it becomes very obvious that there is nothing that can be aware of any experiences in computers as it is all simply a sequence of arithmetic operations.

By spelling out what brains are doing it becomes very obvious that it's all simply a sequence of chemical reactions - and yet here we are, having experiences. Software will never have a human experience - but neither will a chimp, or an octopus, or a Zeta-Reticulan.

Mammalian neurons are not the only possible substrate for intelligence; if they're the only possible substrate for consciousness, then the fact that we're conscious is an inexplicable miracle.


If an algorithmic process is an experience and a collection of experiences is intelligence then we get some pretty wild conclusions that I don't think most people would be attempting to claim as it'd make them sound like a lunatic (or a hippy).

Consider the (algorithmic) mechanical process of screwing in a screw into a board. This screw has an "experience" and therefore intelligence. So... The screw is intelligent? Very low intelligence, but intelligent according to this definition.

But we have an even bigger problem. There's the metaset of experiences, that's the collection of several screws (or the screw, board, and screwdriver together). So we now have a meta intelligence! And we have several because there's the different operations on these sets to perform.

You might be okay with this or maybe you're saying it needs memory. If the later you hopefully quickly realize this means a classic computer is intelligent but due to the many ways information can be stored it does not solve our above conundrum.

So we must then come to the conclusion that all things AND any set of things have intelligence. Which kinda makes the whole discussion meaningless. Or, we must need a more refined definition of intelligence which more closely reflects what people actually are trying to convey when they use this word.


> If an algorithmic process is an experience and a collection of experiences is intelligence

Neither, what I'm saying is that the observable correlates of experience are the observable correlates of intelligence - saying that "humans are X therefore humans are Y, software is X but software is not Y" is special pleading. The most defensible positions here are illusionism about consciousness altogether (humans aren't Y) or a sort of soft panpsychism (X really does imply Y). Personally I favor the latter. Some sort of threshold model where the lights turn on at a certain point seems pretty sketchy to me, but I guess isn't ruled out. But GP, as I understand them, is claiming that biology doesn't even supervene on physics, which is a wild claim.

> Or, we must need a more refined definition of intelligence which more closely reflects what people actually are trying to convey when they use this word.

Well that's the thing, I don't think people are trying to convey any particular thing. I think they're trying to find some line - any line - which allows them to write off non-animal complex systems as philsophically uninteresting. Same deal as people a hundred years ago trying to find a way to strictly separate humans from nonhuman animals.


Continuing this reductio ad abusrdum, you might reach the fallactious conclusion, as some famous cranks in the past did, that intelligence is even found in plants, animals, women, and even the uncivilized savages of the new continent.

Intelligence appears in gradients, not a simple binary.


> Intelligence appears in gradients, not a simple binary.

Sure, I'm in no way countering such a notion and your snarky comment is a gross mischaracterization of my comment. So far off I have a difficult time believing it isn't intentional.

The "surprise" is not that plants, animals, or even women turn out to be intelligent under the definition of "collection of experiences" but that rocks have intelligence, atom, photons, and even more confusingly groups of photons, the set of all doors, the set of all doors that such that only one door per city exists in the same set. Or any number of meta collections. This is the controversial part, not women being intelligent. Plants are still up for debate, but I'm very open to a broad definition of intelligence.

But the issue is that I, and the general fields of cognitive science, neuroscience, psychology, and essentially everyone except for a subset of computer scientists, agree that intelligence is more than a collection of experiences (including if that collection has memory). In other words, it is more than a Turing Machine. What that more is, is debated but it is still generally agreed upon that intelligence requires abstraction, planning, online learning, and creativity. But all these themselves have complicated nuanced definitions that are much more than what the average person thinks they mean. But that's a classic issue where academics use the same words normal people do but have far more restrictions on their meaning. Which often confuses the average person when they are unwilling to accept this fact that words can have different meanings under different contexts (despite that we all do this quite frequently and such a concept exists in both our comments).


You seem to use the word intelligence to mean `consciousness` (if you replaced the first with the latter I would agree with your argument).

I would define "intelligence" as (1) the ability to learn or understand or to deal with new or trying situations and (2) the ability to apply knowledge to manipulate one's environment.

It turns out that this is also the Merriam-Webster definition [0]. By that definition, yes AlphaZero was learning and understanding how to deal with situations and is intelligent, and yes most machine-learning systems and many other systems that have a specific goal and manipulate data/the environment to optimize for that goal, are intelligent.

By this definition, a non-living, non-conscious entity can be intelligent.

And intelligence has nothing to do with "experiences" (which seem to belong in the "consciousness" debate).

[0]: https://www.merriam-webster.com/dictionary/intelligence


This is a common retort. You can read my other comments if you want to understand why you're not really addressing my points because I have already addressed how reductionism does not apply to living organisms but it does apply to computers.


The comments where you demand an instruction set for the brain, or else you'll dismiss any argument saying its actions can be computed? Even after people explained that lots of computers don't even have instruction sets?

And where you decide to assume that non-computable physics happens in the brain based on no evidence?

What a waste of time. You "addressed" it in a completely meaningless way.


>It's missing a mechanism to explore, search and expand its experience.

Can't we create an agent system which can search the internet and choose what data to train itself with?


you need to define what the utility function of the agent is so it can know what to actually use to train itself. If we knew that this whole debate about human intelligence in computers would either be solved already or well on its way to being solved.


In the case of AI, the more parameters, the better! In Physics is the opposite.


One of the hardest parts of training models is avoiding overfitting, so "more parameters are better" should be more like "more parameters are better given you're using those parameters in the right way, which can get hard and complicated".

Also LLMs just straight up do overfit, which makes them function as a database, but a really bad one. So while more parameters might just be better, that feels like a cop-out to the real problem. TBD what scaling issues we hit in the future.


A dichotomy between these fields


Your humorous observation captures a fundamental truth to some extent


I mean the devil is in the details. In Reinforcement Learning, the target moves! In deep learning, you often do things like early stopping to prevent too much optimization.


There is no such thing as too much optimization. Early stopping is to prevent overfitting to the training set. It's a trick just like most advances in deep learning because the underlying mathematics is fundamentally not suited for creating intelligent agents.


Is over fitting different from 'too much optimization'? Optimization still needs a value that is optimized. Over fitting is the result of too much optimization for not quite the right value (i.e. training error when you want to reduce prediction error)


What value is being optimized and how do you know it is too much or not enough?


I think the miscommunication is due to the proxy nature of our modeling. From one perspective, yes you're right because it's just on your optimization function and objectives. But if we're in the context where we recognize the practical usage of our model replies on it being an inexact representation (proxy) then certainly there is too much optimization. I mean most of what we try to model in ML is intractable.

In fact, that entire notion of early stopping is due to this. We use a validation set as a pseudo test set to inject information into our optimization products without leaking information from the test set (why you shouldn't choose parameters based on test results. That is spoilage. Doesn't matter if it's status quo, it's spoilage)

But we also need to consider that a lack of divergence between train/val does not mean there isn't overfittng. Divergence implies overfittng but the inverse statement is not true. I state this because it's both relevant here and an extremely common mistake.


Most practitioners seem to understand that what they are doing is creating executable models and they don't confuse the model based on numeric observations with the actual reality. This is why I very much do not like all the AI hype and how statistical models were rebranded as artificial "intelligence" because the people who are not aware of what the words mean get very confused and start thinking they are nothing more than computers executing algorithms to fit numerical data to some unspecified cognitive model.


> Most practitioners seem to understand that what they are doing is creating executable models and they don't confuse the model based on numeric observations with the actual reality.

I think you're being too optimistic, and I'm a pretty optimistic person. Maybe it is because I work in ML, but I've had to explain to a large number of people this concept. This doesn't matter if it is academia or industry. It is true for both management and coworkers. As far as I can tell, people seem very happy to operate under the assumption that benchmark results are strong indicators of real world performance __without__ the need to consider assumptions of your metrics or data. I've even proven this to a team at a trillion dollar company where I showed a model with lower test set performance had more than double the performance on actual customer data. Response was "cool, but we're training a much larger model on more data, so we're going to use that because it is a bit better than yours." My point was that the problem still exists in that bigger model with more data, but that increased params and data do a better job at hiding the underlying (and solvable!) issues.

In other words, in my experience people are happy to be Freeman Dyson in the conversation Calavar linked[0] and very upset to hear Fermi's critique: being able to fit data doesn't mean shit without either a clear model or a rigorous mathematical basis. Much of data science is happy to just curve fit. But why shouldn't they? You advance your career in the same way, by bureaucrats who understand the context of metrics even less.

I've just experienced too many people who cannot distinguish empirical results from causal models. And a lot of people who passionately insist there is no difference.

[0] https://news.ycombinator.com/item?id=40964328


Freeman Dyson recounts the episode [1] that inspired this paper in his web of life interviews (prepositioned to the fitting an elephant bit) [2]

[1] https://youtu.be/hV41QEKiMlM

[2] https://youtu.be/hV41QEKiMlM?t=118


Thanks for linking these, I was not very familiar of these works/discussions taking place in the past, but these really helped establish the context. Very grateful that these videos are readily available.


I listened to the whole series with Dyson some time in the past year. It was well worth it. I also listened to the series with Murray Gell-Mann [1] and Hans Bethe [2]. All time well worth spending, and I've been thinking of downloading all the bits, concatenating them into audio files, and putting them on my phone for listening to when out on walks (I'm pretty sure the videos do not add anything essential: it's just a video of the interviewee talking - no visual aids).

[1] https://www.youtube.com/playlist?list=PLVV0r6CmEsFxKFx-0lsQD...

[2] https://www.youtube.com/watch?v=LvgLyzTEmJk&list=PLVV0r6CmEs...



Nice. This is like how you can achieve unlimited compression by storing your data in a filename instead of in the file.


From the paper:

> Paintadosi [4] argues that one parameter is always enough. He constructed a function that, through a single parameter, can depict any shape. However, in essence, this work is a form of encoding, mapping the shape into a real number with precision extending to hundreds or even thousands of decimal places. For our problem, this is meaningless, although the paper’s theme is that “parameter counting” fails as a measure of model complexity


The number of parameters is just the wrong metric, it should be the amount of information contained in the parameter values, their entropy, Kolmogorov complexity or something along that line.


That's like saying your entire hard drive is a single number.


https://github.com/philipl/pifs

> πfs: Never worry about data again!

> πfs is a revolutionary new file system that, instead of wasting space storing your data on your hard drive, stores your data in π! You'll never run out of space again - π holds every file that could possibly exist! They said 100% compression was impossible? You're looking at it!



"This single parameter model provides a large improvement over the prior state of the art in fitting an elephant"

Lol


Sadly, the constant term (the average r_0) is never specified in the paper (it seems to be something in the neighborhood of 180?): getting that right is necessary to produce the image, and I can't see any way not to consider it a fifth necessary parameter. So I don't think they've genuinely accomplished their goal.

(Seriously, though, this was a lot of fun!)


They say in the text that it’s the average value of the data points they fit to. I think whether to count it as a parameter depends on whether you consider standardization to be part of the model or not


I see your point, that it's really just an overall normalization for the size rather than anything to do with the shape. I can accept that, and I'll grant them the "four non-zero parameters" claim.

Though in that case, I would have liked for them to make it explicit. Maybe normalize it to "1", and scale the other parameters appropriately. (Because as it stands, I don't think you can reproduce their figure from their paper.)


Lol. Loved it.

This was a lovely passage from Dyson’s Web of Stories interview, and it struck a chord with me, like it clearly did with the authors too.

It happened when Dyson took the preliminary results of his work on the Pseudoscalar theory of Pions to Fermi and Fermi very quickly dismissed the whole thing. It was a shock to Dyson but freed him from wasting more time on it.

Fermi: When one does a theoretical calculation, either you have a clear physical module in mind or a rigorous mathematical basis. You have neither. How many free parameters did you use for your fitting?

Dyson: 4

Fermi: You know, Johnny Von Neumann always used to say ‘with four parameters I can fit an elephant; and with five I can make him wiggle his trunk’.


I wish there was more humor on arXiv.

If I could make a discovery in my own time without using company resources I would absolutely publish it in the most humorous way possible.


There's plenty of humor on arXiv, and that's part of why it's so incredible!

Some lists:

https://academia.stackexchange.com/questions/86346/is-it-ok-...

https://www.ellipsix.net/arxiv-joke-papers.html


Joke titles and/or author lists are also quite popular, e.g. the Greenberg, Greenberger, Greenbergest paper[1], a paper with a cat coauthor whose title I can’t seem to recall (but I’m sure there’s more than one I’ve encountered), or even the venerable, unfortunate in its joke but foundational in its substance Alpher, Bethe, Gamow paper[2]. Somewhat closer to home, I think computer scientist Conor McBride[3] is the champion of paper titles (entries include “Elimination with a motive”, “The gentle art of levitation”, “I am not a number: I am a free variable”, “Clowns to the left of me, jokers to the right”, and “Doo bee doo bee doo”) and sometimes code in papers:

  letmeB this (F you) | you == me = B this
                      | otherwise = F you
  letmeB this (B that)            = B that
  letmeB this (App fun arg)       = letmeB this fun `App` letmeB this arg
(Yes, this is working code; yes, it’s crystal clear in the context of the paper.)

[1] https://arxiv.org/abs/hep-ph/9306225

[2] https://en.wikipedia.org/wiki/Alpher%E2%80%93Bethe%E2%80%93G...

[3] http://strictlypositive.org/


> paper with a cat coauthor whose title I can't seem to recall

You probably have in mind https://en.wikipedia.org/wiki/F._D._C._Willard (coauthor of multiple papers, sole author of at least one).


Consider posting this as a new post! It seems like a fun list to read through


Pretraining on the Test Set Is All You Need

https://arxiv.org/abs/2309.08632


There is . it is called the General Math section. what is more funny than a 2 page proof of the Reiman Hypothesis?


I have to plug Dr Octave Levenspiel. Levenspiel was a professor emeritus when I did my undergrad. He did much of the work on industrial fluidized beds among other things. The elephant curve discussions were a criticism of the complex multi-parameter fitting for heterogenous catalysis of the time. https://levenspiel.com/elephants/

He tried for a while to get an aerodynamics paper published on the flight of dinosaurs. http://levenspiel.com/wp-content/uploads/2016/02/DinosaurW.p...

This intellectual curiosity reminded me a bit of Feynman and his plate spinning.


No love for D'Arcy Thompson on growth and form? His parametric models for organisms were quite nice (if very simplistic)

https://en.wikipedia.org/wiki/On_Growth_and_Form


IIUC:

A real-parameter (r(theta) = sum(r_k cos(k theta))) Fourier series can only draw a "wiggly circle" figure with one point on each radial ray from the origin.

A compex parameter (z(theta) = sum(e^(z_ theta))) can draw more squiggly figures (epicycles) -- the pen can backtrack as the drawing arm rotates, as each parameter can move a point somewhere on a small circle around the point computed from the previous parameter (and recursively).

Obligatory 3B1B https://m.youtube.com/watch?v=r6sGWTCMz2k

Since a complex parameter is 2 real parameters, we should compare the best 4-cosine curve to the best 2-complex-exponential curve.


One take away: Don’t count parameters. Count bits.


Better yet, count entropy.


Why “better”? Entropy in the information theoretic sense is usually quantified in bits.


Ya know, in academic writing I tend to struggle with making it sound nice and formal. I try not to use the super-stilted academic style, but it is still always a struggle to walk the line between too loose and too jargony.

Maybe this sort of thing would be a really good tradition. Everyone must write a very silly article with some mathematical arguments in it. Then, we can all go forward with the comfort of knowing that we aren’t really at risk of breaking new grounds in appearing unserious.

It is well written and very understandable!


> It only satisfies a weaker condition, i.e., using four non-zero parameters instead of four parameters.

Why would that be a harder problem? In the case that you get a zero parameter, you could inflate it by some epsilon and the solution would basically be the same.


> In the case that you get a zero parameter, you could inflate it by some epsilon and the solution would basically be the same.

Not everything is continuous. Add an epsilon worth of torsion to GR and you don't get almost-GR, you get a qualitatively different theory in which potentially arbitrarily large violations of the equivalence principle are possible.


That's not relevant here though, because their function is continuous and they're fitting to an arbitrary shape. It's not a "perfect science," so there would be wiggle room.


They also, effectively, fit information in the indexes of the parameters. I.e., _which_ of the parameters are nonzero carries real information.

In a sense, they have done their fitting using nine parameters, of which five are zero.


I didn’t read enough to catch that. How the heck did they justify that?


Reminds me of an old joke: "What is the difference between an elephant and an aspirin?" - "There isn't any, except the elephant is large, wrinkly and grey."


Another take away (not directly stated in the article but implied): Counting the information content of a model is more than just the parameters; the structure of the model itself conveys information.


I think often underappreciated insight


Love how they misspelled Piantadosi as Paintadosi :)


What is that horizontal bar above r0 in the last equation?


ATCG


what's the purpose of this? is it one of those 'fun' problems to solve?


This quote might help - https://en.wikipedia.org/wiki/Von_Neumann%27s_elephant#Histo...

yes, a fun problem, but also a criticism of using to many parameters.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: