Hacker News new | past | comments | ask | show | jobs | submit login
Why isn't there a replication crisis in math? (jaydaigle.net)
302 points by jseliger on Feb 2, 2022 | hide | past | favorite | 253 comments



The paper here seems to make absolutely zero distinction between deductive and inductive reasoning, which should be the entire point.

Math is an arbitrary framework, built from arbitrary axioms. It is deductive. Thus, all proofs are simply deduction. The knowledge here is positive knowledge, we can show things are true, false, or undecidable. There may be errors, but those are errors in execution.

Psychology is not built on a framework. It is inductive, we are trying to find axioms that map to the data we collect. Thus, all papers are trying to add/build it's arbitrary framework. The only knowledge here is negative knowledge, falsification, we know only what is a failed hypothesis. There will be errors, both in execution, and there will also be statistical errors in experimental results.

The entire point of the replication crisis is that we don't publish or pay attention to results that are boring, so the framework we build is built on skewed data. We don't reject previously popular papers that are now unfalsifiable (the idea that the now unfalsifiable Milgram experiment is still taught in every university psychology dept should be outrageous). The boring results need to be weight statistically against the interesting results, but aren't, etc. Nobody out there is arguing whether or not the axiom of choice is a true axiom? It sort of doesn't matter, it can't matter, because it's arbitrary by definition.

You can't have a replication crisis inside of a deductive framework without changing the framework. This doesn't happen too often, but we did see this during the shift from neutonian to einsteinian physics. The study of philosophy of science is fairly obscure, but is the center of this discussion.


I think you're wrong on some of these points, and the author was trying to make these points but was not explicit.

I remember reading an article in the intelligencer by a mathematician who essentially said there are certain conjectures where if they were to be proved false rather than be thrown into a sea of uncertainty, mathematicians i would quickly move to investigate a readjustment of basic axioms rather than accept that those conjectures are incorrect.

Then there are fields of mathematics around selecting different axioms. Investigating the ramifications of whether you take the undecidable "continuum hypothesis" as true or false. And then theres model theory and such. Presumably they study models of interest and not arbitrary ones.

You're mostly correct that the methodology is mostly deductive but the point is that what we choose to use isn't arbitrary because there are things in math which are more important than axioms as they are things believed to be "real".


I don’t mean "arbitrary" in the sense that it doesn’t matter to the engineering of a building. I mean "arbitrary" in the sense that unprovableness within the framework. Axioms are by definition arbitrary in that sense.

edit: I'll try to expand on this. Math is weird because it's a sort-of in-framework study. We use math when we do physics, and we use physics when we do engineering.

If the physical world ever started disagreeing with our physics, then we need new physics. If the math ever started showing inconsistent results in our physics, then we would need new math.

None of that is to say that Neutonian Physics, say, is "wrong in the sense that it's internally inconsistent", only that "it's wrong in the sense that it is not an accurate mapping of a framework to the world we find ourselves in." The first type of wrongness is the type of wrongness we concern ourselves with in deductive studies, the second type, the mapping-error, is inductive, and requires meta-analysis, and is prone to (Hume's) problem of induction, which means it's ultimately unknowable (i.e. Popper).


Here is where you're wrong:

> You can't have a replication crisis inside of a deductive framework without changing the framework.

You may change the framework, but the replication crisis would be that people don't notice it's changed.

A proof can get so huge and complex that it would take a lifetime of study to understand it. One of the article's good points was that "you can replicate a math paper by reading it", but if you cannot read a proof, you cannot replicate it. If nobody can understand it, nobody can replicate it. There would be a big crisis if people started trusting huge papers blindly and using their results without stating them as assumptions. Mathematicians are largely not doing that, so there isn't really a crisis. But they might! And you would not notice that the framework had become inconsistent for a while. Therein would lie the crisis.

It is nice to think that as soon as the framework changed, people would notice, because surely if it the ground moved, that would be obvious! But it is not true. Nothing about the deductive-ness or in-framework-ness of mathematics changes this. It is the same for psychology, the replication crisis is a crisis because people do not notice these results are wrong, and then rely on them in clinical guidelines and affect people's lives for generations.

It is possible to stave off replication crises by using a machine, because mathematics is self-contained and you can use a computer to show that you proved something using only these specific assumptions and they can believe you without reading and understanding it fully. But this doesn't mean mathematics cannot ever have a replication crisis.

I think your arguments here are best summarised as confusing "replication crisis" with "any published results being inconsistent with previous results". Your idea of "inconsistent" is also strange, the article considers this to mean mistakes in proofs, whereas you mean new mathematics that changes the field by discovery. You're just talking about something completely foreign from the concept of a replication crisis.


Here's where you are wrong > But they might! And you would not notice that the framework had become inconsistent for a while.

As the author stated, the math can be reproduced by reading it: ergo, thee is no barrier to entry except knowledge. The other sciences do not have this luxury - they require access to specialized instruments, participants, funding, and most importantly - time. Math can be checked nearly instantaneously, so the "might" becomes inconsequential my small.


> Math can be checked nearly instantaneously, so the "might" becomes inconsequential my small.

Can it though? Pickovers 2013 edge response comes to mind, can hundreds of pages of a proof based on axioms that require particular expertise to understand be checked instantaneously? I think it would easier to replicate the typical psychology experiment than to check the math in that case.

https://www.edge.org/response-detail/23670


A few edge case examples do not constitute a crisis. The point is not that 0 cases happen in mathematics, but rather that unreproducible results are not the norm, while in other sciences, it is the norm.


You might be mistaking me for someone who stated there is a math replication crisis. I replied to one specific claim


Also, math is universal. A study can be true in one place and false in another; a proof cannot. This is another barrier to replication.


Math cannot be checked instantaneously. Mochizuki's proof for the abc conjecture mentioned in the article was first released in 2012, and despite conferences and lots of time spent trying to figure out what he's saying and whether it's correct there is still no consensus.


As an aside, as I understand it there is general consensus now that that the proof is flawed: https://www.math.columbia.edu/~woit/wordpress/?p=12220


Thanks the post. This is the first time I've read an article by a math professor in such an acerbic and relatively handwoven way. Seems like over the years this proof must have generated passionate disagreement.


Right, but that's a particularly rare example in mathematics, not the norm.


> no barrier to entry except knowledge

That’s fundamentally it, there’s a barrier to entry, and it’s knowledge. Other sciences may require specialized instruments and time, but math is also subject to the problems of knowledge not being instantaneously transferable. And I imagine, although I’m not certain how prevalent it is, that many branches of mathematics are also dependent on instrumentation these days, in the form of software.

And we all know how software can be…


This scenario was actually pretty close when Andrew Wiles published his proof of the Last Fermat Theorem. The proof was huge, IIRC over 100 pages of maths, some of it pretty obscure, and verifying it took a lot of work of several people.

They even found a flaw in it, but Wiles was able to fix it after a year.


Or now the "proof" of the abc conjecture with the inter-universal Teichmüller theory


I’m talking about an event like the incompleteness theorem that ends aspirations like those of Russell and Whitehead. Paradoxes that lead to new axioms like in the development of set theory.

Deductive frameworks can’t be wrong in the way that experimental results can be statistical anomalies. The author conflates errors with perfectly reasonable experimental results later shown to be anomalous upon replication.


Again you are talking about the framework as if it is exactly the same thing as the papers that attempt to describe it. It is not. Your statement that "deductive frameworks can't be wrong" is irrelevant to the question of whether a math paper can be wrong. If I prove something, it can be wrong, and it may turn out that I haven't proved it at all, but I can still call it a proof, convince other people it's correct, get it published in a journal, and have other mathematicians rely on it before the mistake is discovered. Papers can be flawed. The papers are not the framework. The reason you think they are the same thing is probably that mathematics has not had that many big mistakes recently, since the renewed effort on solid foundations last century, which means the system is working.

You can't define your way out of the fact that mathematicians will make mistakes and write papers that draw conclusions that are not valid. They are just pretty good at discovering the mistakes, and finding them is culturally essential to the practice of mathematics; arguably if it were not, it would be a big game of Numberwang. Psychologists are, on the other hand, not nearly as good at discovering mistakes. We didn't have to use any properties of the self-containedness of the disciplines to get to that result. It was not necessary to prove mathematically that mathematics does not or cannot have a replication problem. You can just observe it.

(Edit, all of that in brief: if it were truly impossible to be wrong in maths, then why do mathematicians spend so much time trying to verify each others' proofs? Are they wasting their time? Or is it the entire reason there isn't a replication problem?)


I think this discussion is a bit off, moving the root of trust from the mathematicians to the computers is the thing that will not only kill mathematics but also our civilization (and I will try to argue this overstatement now).

People have a hard time understanding things on extreme ends of the scale because they just assume everything is within some reasonable bounds of effort, this is a failure of imagination.

Mathematics is not a reasonable amount of effort.

What's more it is depended on by literally everything else we do (as a civilization).

Theorem proving software (so far) makes the fatal mistake of just trusting; the machine it is running on, the operating system it is running in, and several other things that can subvert assumptions. The correctness of a proof is justified by the trust we place in the heavily audited kernel which will check the proof script outputted by compilers for the human (textual) interface but that kernel is just another program on your (mostly) proprietary platform.

Mathematics is a religion that currently dominates the planet (and as with most religions, it is totally ambient to people in that they don't realize to what extent they base their behaviour on it) but it is not without competition and the thing we must realize is that this competition is as motivated, ruthless and mission-driven as anything else on the planet.

Just as we have people sitting around learning to recite the qur'an or the bible from memory we also have universities full of mathematics students reciting proofs and theorems by memory, trying to keep the knowledge that brought us this far alive.

Computer memory is inherently volatile and we do not have the same conventions and procedures in place as we do for safeguarding the legitimacy of written texts. These take time to develop and are usually motivated by tragedies (such as the loss of our "prehistory").

Try to imagine the amount of effort it would take to re-assess every piece of information that we have (at current time) because we accidentally let the entropy into our libraries.


You seem to believe that computer software and hardware has no error detection and correction built into it. This is definitely not true.

The reproducibility of computer-derived knowledge is what makes it something we can trust. Especially if it's being reproduced on disparate hardware and disparate software.

Mathematics is possibly the only thing in the entire world that can _not_ be described as a religion.


> Mathematics is possibly the only thing in the entire world that can _not_ be described as a religion.

Math works, whether you believe in it or not.


The results of the Milgram and Stanford Prison Experiments (the amount of harm caused by studying obedience and harm to humans) were instrumental in giving us the very ethics codes that make them unfalsifiable. That makes them an important facet of the field of psychology's history and worth teaching for that alone. If you believe that ethics code is appropriate, then you believe those studies are valid and don't need to be run again to re-verified or falsified. If you really don't trust the ethics code which was produced in reaction to the results of these 20th century experiments, then the authentic thing to do would be to re-run the experiments clandestinely and disseminate the results anonymously.


The ethical codes are extremely important, that's not my point.

I'm saying we are taught the results are correct, when, due to the impossibility of reproduction we can't know whether the results are correct or incorrect.

What should be discussed in the Milgram experiment is "we can't know the results of the experiment because one of the subjects killed themselves due to their participation. Try to consider the ethical implications, don't design experiments that might lead to psychological trauma, etc." The idea that the results being true is why we shouldn't do the experiment is wrong to do, is nonsense, because we can't know whether the results are true because can't replicate the experiment.

Yet, it's results are published as a pop-science book: https://en.wikipedia.org/wiki/Obedience_to_Authority:_An_Exp...

Instead, these two studies if their findings, not the ethics, are discussed at all, are a perfect illustration of the problems that Karl Popper discusses regarding falsifiability.


I think one of the subjects killed themselves due to participation because of the deep inner tension caused by our struggles with obedience versus conscience. The results of the studies bear that out - people largely obey authority unto causing serious harm, that's what the studies tell us, and it's not an inordinate stretch to say that this dissonance causes a person to feel an urge to commit suicide such that, if few people obeyed authority giving malicious orders, they would feel little dissonance and would be way less likely to off themselves. And that's the connection between the results and the ethics. It may not be awfully scientific of me, but I'd yell you out of the quietest room on earth for trying to claim the results have no bearing on that.


"We can't know the results because we can't repeat the experiment," is exactly what I'd yell right back. Repeatability is required for falsifiability, and falsifiablity is required for empirical knowledge, period.


Repeatable and repeated are not the same thing. The Milgram experiment is repeatable in principle. It is falsifiable. The reasons to doubt the claimed results are not because we can't do it again, but because of the details of how it was done.

We also should not do it again.


It's like saying that we can't know whether parachutes reduce injuries when falling from airplanes because we've never had a proper experiment to verify this and most likely won't ever run one.

We can know all kinds of things with a reasonable degree of certainty even if we can't or won't verify them experimentally.


We have, in fact, thrown human analogues out of planes with and without parachutes. There are many ways to experimentally verify things in ways that don't require the use of an actual human - and in fact the development of accurate analogues is its own active area of research so that we can continue to expand experimental research into new areas.


This just seems to push the results down one level - how did we do the experiments to determine that the analogues are accurate?


Some historical studies, some modern studies, on human bodies. By combining the data we have on things like nerve conduction studies, muscular studies, in the living and then experiments on cadavers donated to science, and so on and so forth.


The why were these experiments done in the first place? I certainly take your point and know the paper you're talking about.

Is it obvious that humans will generally shock people until they are told not to? Is it obvious that randomly assigned students will take on the roles that they are given?

I don't think it's obvious. I think that is why the studies were done in the first place, and I am generally quite skeptical of the results. There is no way to verify them, thus teaching them is genuinely bad science.


> Is it obvious that humans will generally shock people until they are told not to? Is it obvious that randomly assigned students will take on the roles that they are given?

It's certainly not obvious, so verification is informative, and that likely was the motivation for doing the experiments. However, despite their flaws - they were limited and biased in various ways - it certainly would be far, far worse science to base our teaching solely on one's assumptions/scepticism/opinion about how things should be, instead of taking into account whatever limited data these experiments provided. Sure, it would be better to have more and better data, but since we won't get it, this does provide relevant information.


Agreed, and they aren't isolated in terms of seminal studies that probably won't be replicated. No ethics board is going to allow us to strap down a seal and put it's head underwater, but that's how we learned a lot about mammalian diving physiology.


The cynic in me answers this with: "Because they got professorships out of it". But then I'm quite cynical


>because we've never had a proper experiment to verify this and most likely won't ever run one.

Isn't it trivial to drop something out of an airplane along with the same thing but with a parachute and measure the impact velocity?


But that doesn't prove anything about how effective they are at preventing injury.


I'm not sure what you mean. We can prove that a parachute allows people to survive jumping out of an airplane with limited injuries. They are effective.


For example, the fact that the impact velocity of some random object is reduced doesn't mean it is reduced from a harmful range to a non-harmful range. It might be equally harmful or equally harmless either way, or there might be additional factors at play in the case of human parachutists that we haven't thought about yet.

We don't have enough data on people jumping out of planes without parachutes to conclude that the parachutes are actually working. They could just be superstition.


That's absurd. The mechanics of gravity, parachutes and jumping out of planes is well understood. Plenty of people have fallen from heights without parachutes, and we know what happens to them and why (acceleration of gravity and the force impact on their bodies). We also know how parachutes, used properly, mostly prevent that sort of harm (slow people down enough to prevent excess force when impacting the ground). There's nothing superstitious about it.


Yes, and in a similar manner we can know quite a lot about how people would behave in situations like the one in Milgram's experiment even if we won't ever do an exact replication.


Yes, BUT in similar situations people behave very differently than the ones in Milgram's experiment. This is the whole point of it being not replicable.

That this remains true is due to a) no one being allowed to repeat Milgram's experiment exactly and b) more general scientific funding bodies not allocating money for pure replications.


Sounds like you have a solid hypothesis there with a convincing proposed method of action.


Are you saying parachutes have never been reliably tested?


The paper they're referring to is a quasi-satirical piece about double-blind testing. The idea is that if double-blind studies are required for knowledge about medicine, then we have no evidence that parachutes have any affect on saving the lives of people falling from an airplane because it's unethical to give people in a testing environment a placebo, because they would die.


One aspect that is special to social sciences and not discussed enough, (and what was partially mentioned in this thread) is that in this case science is built on moving grounds. Viral papers influence society, society itself changes due to other factors. Metrics become goals. And that's why they have a time constraint for the period when falsification is possible. This continuous validity constraint is unknown, but it exists for every social phenomenon.


I never thought about that. Like soros’ market reflexivity, but for science.

Makes a lot of sense.


Yes, economic behavior is a good example of this. There are also countless social strategies that are in constant competition. When you provide "objective" proof of some behavior being beneficial, it becomes a competitive advantage for others to have, a "life hack" that you should adopt. And this influx of alien adopters often corrupts it, rendering less useful, skewing the study results finally. Meditation and mindfulness practice in general is a good recent example of such thing happening.


Then run the experiments again in secret. Find people who are so passionate about your epistemological model they are willing to take the risk of renegade experiments on themselves. That would be an actual adventure, much better than roleplaying a yelling match at each other in comments sections.

EDIT: Hell, pretend play is for weaklings, I'll be your confederate jumbo-dumbo test taker faker, and let's go a step beyond the original experiment, let's have me actually suffer serious shocks in our Milgram Repro, for your n >= 30 guineau pigs to see. Then we'll know even better and that can be the new standard. And I'll be your confederate convict prime, set aside in especially horrible solitary confinement conditions for the Stanford Prison Repro. And when you watch me and others actually suffer horribly for days on end due to your decisions, I'm sure you'll simply have zero suicidal ideation, and you'll repeat your "we can't know!!!11!!1" drivel.


The Stanford Prison experiment was theater. Milgram was only slightly better. Never mind being unethical, they are just poor science. This is the core of the replication crisis in psychology: theatrical, grandiose claims get all the attention and people leap through hoops to defend them.

https://www.theatlantic.com/health/archive/2015/01/rethinkin...

https://www.vox.com/2018/6/13/17449118/stanford-prison-exper...


If you ask me, all of civilization is a continuous exercise in repeating the Milgram experiment in more and more extreme ways.


> If you believe that ethics code is appropriate, then you believe those studies are valid and don't need to be run again to re-verified or falsified.

Why? The experiments would still be unethical, even if they led to the opposite result.


Why?

If the opposite result is "some people are prisoners, some are guards, and they all sit around and have a jolly old time" then what's unethical about running that experiment?


> If you believe that ethics code is appropriate, then you believe those studies are valid

But why? The standord experiment was performative, it wasn't even an actual experiment, as shown from the experiment notes. How did you come to believe that it is mandatory to believe that study is valid?


> If you believe that ethics code is appropriate, then you believe those studies are valid and don't need to be run again to re-verified or falsified.

This would only be true if the hypothesis being tested in these experiments was "psychology as a field needs a code of ethics". That wasn't either hypothesis.

"We need a code of ethics" does not imply "Milgram and Stanford prison proved ___ about authority figures".


Exactly, I can't understand what the author was thinking. If a paper shows that 2+2=4, what would the replication problem be? You write the paper out again and 2+2 turns out to equal 5?

"The results can't be replicated" is different from "the logic here is wrong". So different that this article starts from an entirely invalid premise.


From the post:

> More seriously, it’s reasonably well-known among mathematicians that published math papers are full of errors.[1]

I think the author's point is that actually the distinction is less clear: many math papers can't be (easily) replicated, yet people in math aren't too worried.

[1] https://twitter.com/benskuhn/status/1419281164951556097


But there's a difference between "can't get the results a second time" and "didn't even get the results the first time".

There isn't even so much a notion of "results", just "logically, Y follows from X". If that's wrong, you don't need to run the experiment again (there is no experiment to run), the logic is just flawed.


I don’t think it’s quite that simple. A math paper generally says 3 kinds of things: X is true, this is the high level idea of how we can prove X is true, and this is the detailed proof that X is true. As the post mentions, usually when there’s a mistake, X is still true.


Well, when there's a mistake found in a proof, the mistake needs to be rectified before the proof can be accepted. There have certainly been proofs that X is true, where X has turned out not to be true based on the fact that there's a mistake in the logic.

While it's possible a mistake won't be found, it's a much different problem than running empirical experiments and getting different results.


Right -- you need to run the argument again.


I agree with you. Don’t math proofs build on each other on the assumption that previous results are true? So if you have an incorrect conclusion, you might end up building an invalid map of related results until you find it conflicts (or your a superstar that triple checks the meaningful dependencies you think might be particularly not well scrutinized). In spirit, that’s not unlike scientific lines of inquiry in other disciplines that compound on the errors of previous results (aka replication crisis).

It’s obviously quantitatively different in very important and meaningful ways, but the author I think is drawing an apt analogy.


It doesn't matter much beyond semantics. Math papers aren't experiments so they can't replicate.

However, math papers can still be wrong! Especially the actual proofs can be wrong. The question is, how many wrong proofs are out there, and how many of them have false conclusions.

If we accept that most published proofs have errors, why isn't that a horrifying failure of Mathematics that shakes fundamental trust in its systems. That is what the article is about.

This hypothetical shaking of fundamental trust would be highly analogous to the replication crisis. And there are more parallel arguments. So to me the premise makes sense, and I see little problem with some semantic wiggle room for sake of analogy.


Why should the Milgram authority experiment not be taught? If nothing else it's suggestive and might inspire new studies. The fact that the original design wouldn't pass a modern IRB doesn't mean the phenomenon is forever closed to investigation


Yes, of course, the Milgram experiment can be taught as a tragedy and a lesson in experimental ethics. But it's "correct results" are almost always taught, even being published as a pop-science book, when they cannot be know because the experiment is too ethically problematic to be replicated in a modern setting.

Ironically, the "correct results" are often cited as the reason for the tragic suicide of one of the participants.


I'm just having trouble with "almost always"

Any psychologist worth their salt knows better than to claim any interpretation of experimental results as unassailable

Pop science is another story (literally) but the profs I've had the privilege of studying under spent a lot of time warning us against the shoddy reasoning you describe

Psychology has a lot of bullshit in its orbit but the discipline as taught in quality departments is just that, disciplined


I don't mean to impugn the study of psychology in the slightest. Only that i find it maddening that obviously unfalsifiable results are taught as though they can teach us anything about human behavior.

I hope you are right and I am wrong, I was taught these experiments this way at a good school, i can only hope that is more uncommon than i think it is.


Fair enough- and agreed. I hope so too.


I couldn't find anything about a participant's suicide when searching, could you provide a link?


I remember that from college but maybe I’m misremembering or maybe I’m simply wrong.

After a brief look around, I also couldn’t find anything about harm to the participants.


Nobody was harmed in the Milgram experiment. That's the whole point of the experiment. It is to fake harm. The one seemingly receiving the electric shock is an actor faking pain.

https://en.wikipedia.org/wiki/Milgram_experiment


I’m talking about the people who believed they were shocking the actor. That experience could easily create mental trauma.


If you replace math with (experimental) physics, you get an inductive framework, yet physics doesn't have the same problems psychology has.


Does the behavior of a particle depend in part on every other experience the particle has had in the past? Physics has things that we can't figure out, and things that have to be observed in the aggregate, but it's much easier to perform multiple experiments with the same base conditions.

If we could clone the participants of a psychology experiment prior to the experiment, with their memories and personalities intact, we could possibly narrow the gap. Or maybe not! But that would be a fascinating finding! ;)


Physics has calculable observation biases baked in. At least you can create experiments where you knowingly limit those biases based on some serious fundamentals. All other fields have so much more to worry about, it really isn't fair.


I think one of the major reasons is that physicists actually understand math and statistics and from the first year of university learn how to design, run and analyze experiments properly and quantify uncertainty.

This is a big difference to humanities or even biology.


See https://en.wikipedia.org/wiki/Oil_drop_experiment#Fraud_alle... and the following section. Physics isn’t immune to it.


The fraud allegations are bogus (according to wikipedia). "Millikan's experiment as an example of psychological effects in scientific methodology" is in itself a psychological study. Has this wikipedia source been peer reviewed and replicated? This is quibbling about fractions of a percent of the secondary measurement, the primary result being quite obvious quantization of electric charge. And! Ultimately the charge measurement result converged on the result known today within 10 years of his original experiment. That's what immunity looks like.


The replication crisis is not at all about some papers not being replicable or being fraudulent - the crisis is about the issue that in certain domains a huge proportion of the papers (e.g. half!) turn out not replicable and actually false.

I'd consider it appropriate to say that physics is indeed immune to replication crisis unless it turns out that a significant proportion (e.g. more than 5%, one order of magnitude less than for psychology) physics papers fail replication.


> the now unfalsifiable Milgram experiment

Total nonsense. Experiments are never falsifiable, what does that even mean? Hypotheses are falsifiable. Did any hypothesis of Milgram's suddenly become unfalsifiable one day in the last 50 years because the psychology research community found a new moral compass? Did the foundations of what is knowable by science shift during the night? "Now unfalsifiable" implies change. So what do you think changed, exactly?

You seem really hung up on a strict Popperian falsificationism that you have misunderstood.

In any case, the article mentions that you replicate a proof by reading it and checking the steps for yourself, so I feel your criticism about deductive reasoning is misplaced. It's implied that soft science papers require experiment to reproduce because they are inductive, just because the author doesn't spell this out for you doesn't mean they missed the distinction...


> Did the foundations of what is knowable by science shift during the night?

Yes! The moment the approval committee decides that certain experiments are no longer allowed, that makes some results no longer reachable by science, or at least not in a reasonable manner.


How far "science" has fallen, if this is how people now define the word.


This doesn't make sense. This is actually addressed in the piece. If inductive reasoning is irrelevant for mathematics, why do then mathematicians have excellent intuitions about what is true even before they manage to prove it? The quote from the essay:

"[D]espite the fact that error-correction is really hard, publishing actually false results was quite rare because “people’s intuition about what’s true is mysteriously really good.” Because we mostly only try to prove true things, our conclusions are right even when our proofs are wrong."

Deduction can't explain this surprising reliability of mathematical intuition, only induction can. The question the author tries to answer is why the intuitions of mathematicians seem to be more reliable than those of social scientists.


I mean, again, this is a very messy thing to talk about explicitly because it's playing fast and loose with language.

I think the fact that our brains are computing/pattern-recognition machines is the obvious reason computing (deducing) is 'easier'. However, human minds are not actually mysteriously really good at math. Our brains are very clearly broken at some types of mathematical problems. It's just something acedemics tell themselves. When it comes to gambling, large numbers, power laws, etc.

The number of times I've had to explain to my friends and family how exponential growth "will appear" with the spread of covid is testament to the fact that our brains don't actually have that math built in, probably because it wasn't particularly important for evolutionary survival and reproduction. The entire field of behavior economics studies this delta specifically.

I'd say the deductiveness of a problem, itself, is what makes people have the intuitions. It's vastly easier to use deduction that it is to do induction... because induction is by ultimately unknowable and effectively solipsistic.


Humans in general may be prone to all sorts of erroneous mathematical reasoning but the speed and consistency with which mathematical concepts reproduce in the minds of mathematicians can justifiably be called mysterious imo, at least until we better understand human language faculties


By your dichotomy, Physics seems closer to Psychology than math. Why aren't there replication crises in Physics? My answer is that there always has been-- it's just that they get resolved quick. The problem with Psych is that it is SLOW


There are replication problems in physics: http://astrobiology.com/2020/10/no-phosphine-in-the-atmosphe...

The entire point of the replication crisis is that we should expect statistical anomalies to occur, and the problem is that anomalous results are wildly more likely to be published. Which is why you probably hear about evidence of life on Venus but not about calibration issues.

It’s just generally easier and more problematic to do experiments testing framework boundaries in sciences with more variables like medicine, or anything to do with behavior.


I would assume that with physics, unlike psychology, or anything that involves biology, is that it's generally much easier to eliminate variables.

Testing a new drug, for instance, is very difficult because there are about a billion chemical processes going on in the human body (or whatever body is being tested), and trying to distill the effects of adding one more, and making sure that the effects are correctly attributable to the right causes is something we can almost certainly never know 100%.

This kind of thing certainly happens in physics, and we can see results like neutrinos appearing to travel faster than light until they are corrected, but it's usually much easier to isolate the processes of interest in a physics experiment than in a biological system, or at least easier to eliminate most extraneous effects.


Because it is "trivially" easier to do a better experiment in Physics (money aside, of course)

Psychology will always have ethic, sample size and cultural/time/space constraints. Your "psychology experiment" done with Mechanical Turk is more an exercise on sampling bias than anything else

And they keep running into "replication failures" because they fail to account for that and try to chase the illusion that it is possible to have a perfect controlled environment in psych. They can't, and they're only fooling themselves they can.


No - both you and the OP miss the real problem.

The problem with Psych is that you can't really experiment on humans in a meaningful way. Animal studies for example don't suffer the same replication problems as psych.


> The problem with Psych is that you can't really experiment on humans in a meaningful way

While absolutely valid this isn't the only problem: One other complication is that conclusions from experimental studies often don't apply to the real world, i.e. people behave differently in experimental settings. Thus in any case you can either have good control over confounding varables but very limited scope (experimental studies) or bad control over the setting but the ability to observe real live behavior (field studies).

But it gets worse: Two people with identical behaviour can have quite different motivations and thus identical observable behaviour can result in quite different consequences. I.E. 5 hours a day playing games can be both healthy behavour or an addiction/a means of suppressing thoughts or memories, depending on the person. So you'll always have that massive confounder called the mind, in every setting. Which means you have to inspect the mind itself, which only works indirectly, i.e. by interacting with the person, asking questions, trying to find out the motivation behind a particular action. And the person often doesn't really know, because a particular decision probably wasn't chosen consciously.


I've literally been experimented on and it was meaningful. The problem, in my view is low sample sizes, poor selection[0], hyper dimensional confounding data, and low rates of application. With math, comp sci, or physics reality hits you in the face when you start to actually use it to build rockets or what have you.

[0] Guess when I was experimented on? Of course it was during university and of course I'm a white english speaker that lives in North America.


Well perhaps if animal studies were primarily concerned with investigating the interaction of reason and emotion in animals there might just well be :)


You may not have read very far into the post, because I think it deals quite well with these issues, if not in the same terms.


If you're actually interested in this topic, you should read "What is this thing called science?" by Alan Chalmers. One big take away from the book is that there are many things that are "factual" and "scientific" but are not "falsifiable".

It's been an important part of modern scientific advancement that falsification is a good, but not perfect way of learning about how the universe works.

For example, macro-evolution is basically unfalsifiable, and yet it is generally regarded as good science.


I don't believe I'm familiar with an Ingram experiment, any chance you could expand on this? I was also unable to find something with a quick internet search, however Milgram was suggested to me.


Milgram experiment, sorry... must have got hit by autocorrect there: https://en.wikipedia.org/wiki/Milgram_experiment

You can add the Stanford Prison Experiment (Zimbardo) to the list of, now unfalsifiable, experiments that inexplicably are regularly taught in university settings when they cannot possibly provide any useful data to the sciences: https://en.wikipedia.org/wiki/Stanford_prison_experiment

The idea that many if not most universities create educational programs based on these very obvious problematic studies, leads me to believe that the scientific community can be as guilty of info-tainment bias as the general public. I only wish that I'd been able to continue in academia so that i could have at least a small voice in changing this problematic dynamic we live with.

They literally made a movie about the Zimbardo experiment in 2015. It couldn't be more obvious that people care more that it be true than whether it might be false: https://en.wikipedia.org/wiki/The_Stanford_Prison_Experiment...


what do you mean when you say "now unfalsifiable", is that because the experiments themselves are too unethical to be repeated?


The Zimbardo experiment, at least, has actually been falsified, so I don't think it's right to call it unfalsifiable.


That's correct, however students are being told about the manipulation these days. Especially concerning the prison experiment.


I think the author makes the distinction in the beginning.

> In experimental sciences, the experiment is the “real work” and the paper is just a description of it. But in math, the paper, itself, is the “real work”.

In other words, there is no replication crisis in math because there is no replication to be done. There is no experiment to be replicated, just work to be checked for correctness.


> the idea that the now unfalsifiable Milgram experiment is still taught in every university psychology dept should be outrageous

I agree with this point in general, but the Milgram studies have been replicated many, many times so potentially not the best example.

The Stanford Prison Study, on the other hand...


You could also argue that Math had its replication crisis in the 17th-19th centuries. E.g. infinite series "proofs" that were eventually shown to be flawed methodologies.

This and other crises led to grounding modern mathematics with set theory, Zermelo–Fraenkel axioms, etc and understanding what's possible (e.g. Godel's theorem).

Psychology and other social sciences are barely a century old.


Other examples:

- The "Italian school of algebraic geometry" of the 19th century used intuitive methods that, while groundbreaking, ultimately proved to be unreliable and generated many false results. https://en.m.wikipedia.org/wiki/Italian_school_of_algebraic_...

- Mochizuki's abc "proof", which seems to be believed a proof by many of his Japanese colleagues, but fatally flawed by most everyone else. https://www.math.columbia.edu/~woit/wordpress/?p=12220

Mathematicians have definitely gotten out ahead of their skis in the past, but I have the impression that the community today is incredibly good at finding flaws in flawed work and making solid work fully explicit and rigorous. It can take years or decades though.


Mathematics is having a replication crisis and people pay so little attention they don’t know.

That replication crisis has led to efforts in formal verification such as HoTT, Lean, etc.

https://homotopytypetheory.org/

https://xenaproject.wordpress.com/2021/06/05/half-a-year-of-...


No by and large mathematics is not having a replication crisis. As the blog post in your link states:

> Question: Was the proof in [Analytic] found to be correct?

> Answer: Yes, up to some usual slight imprecisions.

This has been the case for almost all math formalization efforts. Even when (very rarely) proofs were revealed to be incorrect, the result was salvageable.


Note that fixing the "slight imprecisions" in a typical ad-hoc proof is still very valuable. Quite often it leads to a simpler, easier to understand, tighter, etc. argument as the proof can be freely "refactored" without changing its validity. This is the underlying reason why formalizing a proof that hadn't been done previously is considered actual, publishable work.


Not rarely. This talk shows several examples: https://www.youtube.com/watch?v=E9RiR9AcXeE

One of the more important results discussed was not salvageable.


> Not rarely. This talk shows several examples

I think you're underestimating the amount of theorems now that have been verified without issue. Those several examples are an absolute drop in the bucket.

> One of the more important results discussed was not salvageable.

I don't see any important result in that talk that was not salvageable. I see a lemma that had to be changed in support of a larger theorem, but again nothing that "broke downstream papers" so to speak. All the results ended up being fine for their purposes.


That's because most math proofs are treading on well understood grounds and only extending them slightly. E.g. it would be like a psychologist asking how an results of a well proven result would be different if all participants wore red shoes.

When you enter truly new grounds mathematicians don't even agree if the distinctions being made have a meaning, let alone if they are true.


> When you enter truly new grounds mathematicians don't even agree if the distinctions being made have a meaning, let alone if they are true.

What examples are you thinking of? I can think of only one case where this has been the case (Mochizuki and the ABC Conjecture), but that turned out to have so much fanfare precisely because mathematicians did not agree on what distinctions were being made and this was the only time in living memory that this had occurred (the general consensus is that Mochizuki's proof is simply too obfuscated to make heads or tails of). As such the ABC Conjecture is not considered solved.

However, that is almost always not the case, even on the cutting edge of mathematics and even when making other breakthrough discoveries (e.g. Fermat's Last Theorem).

> E.g. it would be like a psychologist asking how an results of a well proven result would be different if all participants wore red shoes.

And to be clear the major reason why fields like psychology are termed to have a "replication crisis" is because well-known results are being overturned, not just cutting-edge ones.

EDIT: I see now that OP also refers to the ABC Conjecture.


>What examples are you thinking of?

Back in grad school I was very much into types which let me do things sideways to how most mathematicians did them.

The worst one was the derivative of a derivative - NOT the second order derivative. Using R for the real numbers, the type signature of first (and any) order derivative is (R -> R) -> (R -> R). The type signature for what I was talking about was ((R -> R) -> (R -> R)) -> ((R -> R) -> (R -> R)). It didn't have very many interesting properties I could find but trying to explain it to anyone else in the department was like pulling teeth. They'd start thinking about second order derivatives every time and there is no common notation for talking about third order functions like there is for first order and (some) second order functions (derivative, integral, transforms, etc).

Mathematics only works as well as it does because mathematicians are all working on pretty much the same thing, not because there is some innate quality in mathematics that pushes mathematicians towards truth.


> The worst one was the derivative of a derivative

What you are talking about already exists, in many different versions at that. I think you really underestimate how varied objects mathematicians works with, they work with so many things that it is hard to come up with original ideas, this idea was already explored close to 200 years ago.

https://en.wikipedia.org/wiki/Fractional_calculus#Fractional...


The type of that derivative is still (R->R)->(R->R).

I also have a PhD in maths, thanks.


It isn't, it takes the derivative of the derivative operator, not just applying the derivative to a function several times. Fractional calculus is the study of viewing derivatives as a continuous quantity rather than discrete, so there taking derivative of the derivative operator makes sense.

You could very well mean something different than what that article is talking about, but it isn't like I am just saying that a second order derivative is the same thing as what you are talking about. But if you mean something different than "the derivative of the derivative operator" then you aren't very good at explaining what you mean, you have to be more precise as math is a wide field and if you describe an object imprecisely then people will misunderstand.

> I also have a PhD in maths, thanks.

You yourself argued that people with PhD in math gets this wrong, so I don't see why you would bring this up. But the most likely scenario is that you just failed to communicate your thoughts properly since you used imprecise language. Maybe mathematicians should get taught more how to be precise with their statements, but usually they can rely on the crutch of old notation. I did invent new notation and solved some old unsolved problems that way in grad school, and the other mathematicians had no problems understanding what I wrote, so mathematicians have no problems understanding new things from my experience.


It IS unusual to worry about type signatures as a mathematician. Sounds to me like your PhD topic was closer to computer science than math? Or at least constructive mathematics ;-)


Mathematical physics oddly enough.

I got tired of having units mismatch in physics equations so I picked up a ton of type theory and used it for everyday work. Using it, rather than talking about it, has given me a very different perspective on it than anyone else I've talked to.


If you keep walking down that path you'll soon be labelled a crank by the old guard.

Developing a practical rather than a theoretical understanding of type theory rapidly makes you express intuitions which other people can't lex/parse.

Because you begin to think in functors/compositions e.g constructively; and as you've already pointed out many of those functors don't have corresponding English nomenclature in a classical setting.

Find some Category Theorists to talk to instead.


> Find some Category Theorists to talk to instead.

Most mathematicians are also category theorists today. They were the ones who invented category theory in the first place and today it is a basic topic most takes in grad school and then used just about everywhere.


Very interesting!


>It didn't have very many interesting properties I could find

Usually notation is defined for higher order functionals when the need arises - when a new result is found to have interesting properties. Topological proofs on function spaces can use third order functions.

> Mathematics only works as well as it does because mathematicians are all working on pretty much the same thing

I don't think any person alive can understand all the major branches of mathematics well. By that measure mathematics is very broad.


So like, a functional derivative of differentiation?

Except like, instead of the 2nd order function returning a number, where the functional derivative would be ((R -> R) -> R) -> ((R -> R) -> R), uh, instead, the thing you got.

Ok, but, differentiation is linear, so, (\frac{d}{dx}(u(x) + \varepsilon \eta(x)) - \frac{d}{dx}(u(x)) )/\varepsilon is just \frac{d}{dx}(\eta(x)) (and so there's not even much need to take the limit as \varepsilon goes to 0, as it doesn't depend on \varepsilon anyway) ..

Uh, that seems a little odd, that the derivative of differentiation in the direction of a function would be the derivative of that function? Like, the derivative of something linear should be a constant, ah, but, the directional derivative of 5 x_1 + 3 x_2 also depends on the direction in which it is taken, though it doesn't depend at all on where it is taken.

Ok, yeah, same situation here then.

Is that the sort of thing you are talking about?

Though, you said "of a derivative" not "of the derivative", so I guess maybe you mean like, more generally things that satisfy Liebniz's law?


This pretty much nails it.

If everyone commits to using approximately the same definitions then (at the social scale) you get normative semantics as an unintentional by-product. Everybody in your tribe uses the same terminology/notation and means the same thing by "==".

It's a desirable property because it minimises miscommunication and it enable effective asynchronous communication. Good luck trying to read a paper by somebody using different semantics.

And then computer scientists come along and point out that all definitions (and by proxy - all Mathematics) are arbitrary.

Because the chosen axioms are arbitrary.


Here's [1] the lecture where Vladimir Voevodsky talked about the problem and his experience with it but like the blog says, he didn't and they don't consider it a crisis. Even HoTT (and other TT) people present it as how things could be much better, not about how things are terrible.

1. https://youtu.be/E9RiR9AcXeE


I worked in applied math, in industry — as the math engine guy supporting economists.

The trials and tribulations of abstract math where these are little errors that turn out to (mostly) be salvaged translates into billion dollar mistakes in mathematical models in industry — errors that have caused a hiring freeze at a major tech company.

I respect other people feel differently — I personally think there is a crisis in mathematics, where the mistakes/errors of Voevodsky et al is the tip of the iceberg, but the most visible since it happens in academia versus industry.

I hope everyone can agree that:

a) it’s currently hard to verify mathematical models and proofs; and,

b) we could make that better — and currently are working on it.


But that's not a crisis of math, it's a crisis of business trying to use math that is not made for it. It's the Post Office Scandal all over again. Overreliance on shitty models, lack of quality assurance, etc. (Or if we look at it like the O-ring debacle of the Challenger accident, or the usual problem of one-off creations, like the Hubble, JWST, or any other public project that is late and over budget. And the F35 is also in this category, because it's also the problem of forcing a model on people that has so many free parameters that it's an endless argument about nothing.)

https://en.wikipedia.org/wiki/British_Post_Office_scandal


You’re trying to correct me about something I experienced first hand in my career — and you’re coming across as ignorant.

What you lament as “business trying to use math that is not made for it” is precisely the problem:

Mathematics can’t be reliably applied, even if you hire a dozen world class PhDs, hundreds of software engineers, and let them spend years trying to make it work.

That’s a crisis.


I think for there to even be a crisis there would have to be consensus on that.


How is math experiencing a replication crisis?

> This site serves to collect and disseminate research, resources, and tools for the investigation of homotopy type theory, and hosts a blog for those involved in its study.

> Exactly half a year ago I wrote the Liquid Tensor Experiment blog post, challenging the formalization of a difficult foundational theorem from my Analytic Geometry lecture notes on joint work with Dustin Clausen.

???


A more recent example is the Italian school of algebraic geometry, where it was discovered in the mid 20th century that many claimed proofs were faulty (and some “proven” results were incorrect).


But then again molecular biology is younger than psychology and still doesn't have its order of magnitude of replication problems.


Molecular biology has a big problem with people faking results. Even when the fakery is pointed out, journals are very, very slow to do anything about this.

Elizabeth Bik has put a lot of work into finding faked images in published papers. Her Twitter feed is pretty entertaining; she presents images and challenges reader to find the duplications. https://twitter.com/MicrobiomDigest

For more, see Nature: https://www.nature.com/articles/d41586-020-01363-z


Despite the title (a title with a question in it invites people to comment without reading the post, even more than the usual already high level), this is a really good post IMO, with valuable insights into not just mathematics but also the replication crisis elsewhere. (And it does discuss Mochizuki's claimed proof of the abc conjecture, and links to MathOverflow on formalization with proof assistants, to a recent paper discussing Vladimir Voevodsky's views, etc.) This from the first part is sound:

> The replication crisis is partly the discovery that many major social science results do not replicate. But it’s also the discovery that we hadn’t been trying to replicate them, and we really should have been. In the social sciences we fooled ourselves into thinking our foundation was stronger than it was, by never testing it. But in math we couldn’t avoid testing it.

But the post doesn't stop there! The second part of the post (effect sizes etc), with the examples of power posing and the impossibly hungry judges, is even more illuminating. Thanks!


This is a thoughtful and thought-provoking blog post. I think it's worth asking similar questions of computer science. I think you'll find some math-like patterns -- there's basically no chance Quicksort or any other fundamental algorithm paper will fail to replicate -- and some patterns which will fail to replicate, like in software engineering.

Some of the early results on pseudorandom generators and hash functions aren't holding up well, but I think that's just progress. We understand the problem a whole lot better than we did back then.

Perhaps more interesting is the literature on memory models. The original publications of the Java and C11 memory models had lots of flaws, which took many years to fix (and that process might not be totally done). I worry that there are a bunch of published results that are similarly flawed but just haven't gotten as much scrutiny.


There was that time when it was discovered that nearly all published binary searches and mergesorts were wrong. [1]

And yet, the concepts of binary search and merge sort are fine.

I think that's quite similar to the situation in math papers? Because math isn't executable, a math paper being "significantly" wrong would be like discovering that a program uses a fatally flawed algorithm and is trying to do the impossible. It can't be fixed.

Programs that can't be fixed seem rare?

[1] https://ai.googleblog.com/2006/06/extra-extra-read-all-about...


Rather than "wrong", I would describe those implementations as "not fully general". They work perfectly when `n < some large number` as opposed to `n < some large number * 2`. The latter is the best you can do with the function signature, but that is somewhat arbitrary. You could easily choose a 64 bit index and exceed all practical needs.


Many of these examples aren't really in C but in C-like pseudo-code. In that domain they are perfectly accurate. Even in C, 'int' bitlength is defined by the implementation. int only needs to be 16 bits or more, but it could easily be enough bits to exceed number of known atoms in the universe in which case overflow is impractical.

I'd say that's an example of the difference between science (the authors don't need to show every detail of a practical implementation and can assume infinite bits in int) and engineering where you do need to make such consideration.


From my experience in ML, I'd suspect that the "crisis" isn't that the research is false so much as it's useless (algorithm x with parameter set w happens to work well on one particular dataset, conclusion: I have revolutionised the field).


This isn't unique to ML. A lot of research is about adding an epsilon to an existing paper, which probably doesn't interesting anyone except a small community working in their very own niche topic.

But does it mean there's a crisis? maybe that's just a way to foster an environment that will let great ideas emerge.


I’d rather be in the world where we have too many papers tweaking the details of power posing and exactly measuring how much each contributes to the effect. At least we’d know the effect is real.


This course of events doesn't play out in math, though.


The parts of CS that are the most math-like (which include fundamental algorithms) don't have a replication crisis, but the ones that are the most social-science like probably do, or would. I would bet large sums of money that a lot of the literature on stuff like "does OOP lead to better code", "does IDE syntax highlighting lead to fewer bugs" etc. would fail to replicate if anyone bothered trying.

The thing is, the general sense I get is that people in CS already have so little confidence in these results that it's not even considered worth the time to try and refute them. Which doesn't exactly speak well of the field!


I worry about ML papers in particular. models are closely guarded, often impractical to train independently due to ownership of the training/test set, or computing power or details left out of the paper. there's no way to mathematically prove any of it works, either. it's like social science done on organisms we've designed.


> there's no way to mathematically prove any of it works, either

Or there is, but then you're doing statistics not just ML.


Some measurements are interesting and valuable without being replicable. For example, the number of online devices or the number of websites using wordpress. Take the same measurement at a later point in time and the results are different. Yet I wouldn't call those fields maths-like.


Research into this stuff is very young and so I think it's fair to be skeptical of the results. I'm hoping we'll eventually come up with more rigorous, reproducible results.


Anything math provable is objective. Anything depending on human opinions like "what's better code" is subjective, and thus suffers from replication.


In competitive programming you could basically assume the pseudocode in a paper is not literally correct and requires some tweaking to work, despite a “proof” of its correctness. Particularly with string algorithms.


long time no see!

there's a couple levels there:

rote translating pseudocode into your target language isn't likely to pan out well.

so instead you run the pseudocode in your mind, develop an intuition on how it works, and that's the "replication" bit this post talks about with reviewing math papers.

but both the pseudocode and your code will likely have edge cases you didn't handle. this isn't a problem for math - that's the category of common trivial/easily fixable proof errors that don't really affect the paper. but they're a problem for machines that run them literally.

maybe a good compromise strategy for formal verification is to declare the insight of the algorithm - recurrence relation or whatever - as an axiom, and then use the prover to whack the tricky edge cases.


Yes, I'm convinced tons of published results are flawed! I heard top researchers tell their students "don't spend too much time on the proofs, nobody reads them"). And much CS scientific papers don't get a lot of attention. But it's not necessarily bad, other researchers builds on top of this work and results consolidate over time.


Isn’t this a misunderstanding? I suspect they rather meant to avoid spending too much time on language in these parts.


No, it's not. In that specific case, the supervisor thought the value of the paper didn't lie in the proofs, plus it was a rank B conference. He rather has his student working on a different paper than spending 1 week on the proof.


Math doesn’t have a replication crisis at all it has a comprehension crisis.

Since proofs are programs one can basically say that mathematical theorems are incredibly detailed software that is completely open source and invites people to identify programs that don’t work and or fix issues.

A famous one is Fermats last theorem which needed a fix but was largely right.

Others have said that it takes 6 months to a year to get published. The other thing with math is the fact that you can get completely scooped and your work is worthless.

Edit: I am using "proofs are programs" very loosely and yes Theorems are much more than programs as other commenters have pointed out.


A more notorious example of the comprehension crisis would be Mochizuki's claimed proof of the abc conjecture. So far fairly few people are willing to claim they both understand and agree with the several hundred pages of 'proof'.


I was tempted to use that but Fermat's last theorem is known to the general public for much longer and has a resolution.


It was a good example, and definitely one of the more important examples of how finding and fixing mistakes in mathematics is supposed to happen. I figured the abc-conjecture would provide a nice contrasting example.


https://en.wikipedia.org/wiki/Shinichi_Mochizuki

It's a fun rabbit hole to go down :)


The math department at my university tried to read through the proof of Fermat's last theorem, as a for-fun activity. They eventually gave up because they realized it would take too much time.


Interesting! My experience is that scooping is less of an issue in math than in any of the science fields I have friends in. Papers are lower-stakes, there's less money involved, and if two of you are working on the same project you can just co-author.

(And if you have an independent paper, that can _also_ get published; your paper is distinct even if the result isn't. I think the PT HOMFLY polynomial was independently proven in like four different papers published within two years (and it's named so that all eight authors get credit).

But also, publication lags shouldn't lead to more scooping, because you can put it up on the arXiv at the beginning of the publication process, not the end. In my experience the paper is treated as "real" once it hits the arXiv; the acceptance is mostly a formality that lets us put it on our promotion packet.

But also, publication times don't lead to scooping generally because you


> Since proofs are programs one can basically say that mathematical theorems are incredibly detailed software that is completely open source and invites people to identify programs that don’t work and or fix issues.

I don't think it's that straightforward; proofs in papers are a mix between explanation in natural language and mechanical steps. Not every step of deduction can feasibly be written out. That's part of why computer aided proofs are not that popular in math.


Having a paper published can take a very very long time, a year is quite short and 6 months is basically the minimum wait. My last paper has taken 2 years to be accepted from the first time I submitted it, despite it being largely the same as the initial submission, accepted as part of my thesis over a year ago, and having several citations. It is very frustrating, and it also means that easier (and less original) work is easier to publish.


An interesting perspective on programs-as-proof is I-forget-his-name-but... the mathematician who made really bold claims, if only you'd study under his tutelage for a number of years to understand this whole new terminology he invented.

With programs-as-proof it really wouldn't matter. It's either "computer says yes" or "compu'er says noooo".

EDIT: Whoop, sibling post mentioned, it's Mochizuki.


This doesn't seem a very generous description of Mochizuki's work. You don't "need" to study under him for a number of years and there's no evidence he's being obscurantist. The proof is long and has a lot of novel techniques he's invented, and he works primarily in Japanese. You can reasonably side with e.g. Scholze's interpretation without thinking Mochizuki is disingenuous or some kind of scammer.

He's considerably less esoteric than e.g. Grothendieck was even during his more "public" years.


I have nothing invested in whether or not any given mathematician is right or wrong. I just picked a random example of a controversial proof -- the point was more that proof-as-computation could settle and any all disputes.

It might not lend more understanding to people not invested in "field X" (or even people who are invested in field X!), but it would be proof.

Proof in the current world of math is quite intangible.


I think the demand for "tangible" proof (if by that you mean, fully mechanically-verifiable proofs and a style unlike anything common in mathematics papers today) is a bit silly and seems to be driven by some ideologies well outside of mathematics, rather than mathematicians themselves.

A proof is whatever convinces sufficient mathematicians that the theorem is consequent! Classical logical systems are a very good way to do that so they get used a lot. But they're not the only way, and involving a computer program makes most proofs less convincing rather than more.


> It's either "computer says yes" or "compu'er says noooo".

Sadly not sometimes it is just:

computer says


Or "computer says x but you don't trust the hardware/software".

Or "computer says x but you don't agree the mathematical concept was correctly formalized into the proof engine language".


> The other thing with math is the fact that you can get completely scooped and your work is worthless

Why math specifically? One would think this applies in virtually all fields.


Increase in development time (publication can take 6 months to a year).


> Since proofs are programs one can basically say that mathematical theorems are incredibly detailed software that is completely open source and invites people to identify programs that don’t work and or fix issues.

I would say its more like pseudocode. There can be quite a large gap between a normal proof, and a machine checkable proof, which is the computer program version.


Aren't programs generally more complex than math proofs?

Like, the more accidental edge cases people produce, the less they understand the program.


Not any proof with quantifiers. As soon as statement contains ∀ (for all) ∃ (there exists one) where you have a way bigger branching factor and the claim after the quantifiers needs to be true all for all (or atleast for one) element of the set you draw from.


Yes, what I meant was, math proofs are well defined, and programs are often a heap of stuff thrown together.

While it's harder to produce the math proof, it's probably harder to grasp what's going on in a program in a mathematical sense.


The root of the replication crisis in social sciences is not just that many papers fail to replicate, but that there is no way to clearly resolve a result that fails to replicate. A paper claims that pre-K increases test scores a decade later, another paper claims it doesn't, and there's no clear resolution. The disagreement just festers, with both sides citing research that supports their opinion. The argument often "spills out" into the public sphere.

In mathematics and computer science, there are many errors in published papers. However, once you point out an error, it's usually pretty straightforward to resolve whether it's really an error or not. Often there is a small error which can be fixed in a small way. Exceptions like the abc conjecture are rare.


> When I’ve served as a peer reviewer I’ve read the papers closely and checked all the steps of the proofs, and that means that I have replicated the results.

The side effect is that math papers have an insane long time to publication. Perhaps 6 months, or 1 year or more if you are unlucky.

In physics, the publication time is like 3 month. Something like 1 month for the first review and then two months for making small changes suggested by the referee and discussing with the editor.

As a side^2 effect, some citation index of the journals count only the citations during the first year. But the papers that have the citation are sleeping over the reviewer desk during that year, so the number is lower than the real number.


My completely subjective opinion is that at the highest levels of math, there are only a handful of people that are even capable of peer reviewing, and their time is in high demand.

Wiles's proof of Fermat's Last Theorem is like 120 pages long and he first delivered it disguised as a class to a bunch of grad students who barely understood any of it and hence gave no feedback. Because this is Fermat's Last Theorem which is famous, eventually people in the math community that understood Wiles's work reviewed it and found an error. Had it been a 120 page proof of some not famous problem like random chessboard thought experiments, it probably could go years without anyone seriously looking at it.


It seems that a lot of commenters here have not read the post, and are simply posting their own opinions about how to answer the question in the title. If that is you, and you are interested in the question, I recommend reading it in full -- there are a lot of interesting points there.


Maybe there isn’t a math replication crisis, but there is a kind of math fragmentation crisis. Sub-fields of mathematics have gotten so deep that specialists can’t communicate with one another [0]. I’ve heard this expressed by other mathematicians as having only 2 or 3 other people in the world that they can discuss their work with.

The level of solitary study necessary to make progress is another sign of this phenomenon. Andrew Wiles famously spent six years alone in his attic to come up with his proofs for Fermat’s Last Theorem.

[0] https://sites.math.rutgers.edu/~zeilberg/Opinion104.html


In Soviet Russia, mathematicians, physicists, etc came to the same conclusions as their enemies in the USA.

However, the social science results, although unified in each country, were almost always different and in conflict.

Social sciences are politics and nothing more. It’s an opinion of how the world ought to be.


because math doesn't do experiments.


Or as noted in the article:

> But one of the distinctive things about math is that our papers aren’t just records of experiments we did elsewhere. In experimental sciences, the experiment is the “real work” and the paper is just a description of it. But in math, the paper, itself, is the “real work”.

And

> And that means that you can replicate a math paper by reading it.


I think that means that the word "experiment" isn't the right term for what most mathematicians do.

I'd say most times it's "modeling", not "experimenting"


This seems the obvious answer. The replication crisis isn't about published material being wrong, it's about the inability to reproduce the results of experiments or studies in a repeatable fashion.

It's not like you make a hypothesis in math and then need to go away and interview a sample of 1,000 circles and report back that, controlling for ellipses that may be misreporting as circles, the ratio of the circumference to the diameter is 3.2 +/- 0.1 (p<0.05).


Exactly. A math study is not a "study" in the sense of "hey i saw a funny pattern in some data maybe it's a sign of my pet theory" - it's literally already proven when published. There's nothing more to do.


There are a lot of math papers / work that are "hey I saw a funny equation/morphism/shape/etc, what does it 'do'?" that present a lot of conjectures / constructions but are weak on any conventionally called a theorem. https://erich-friedman.github.io/packing/ is probably one of the best known on HN.


https://www.experimentalmath.info/ would beg to differ, I think.


"Mathematics is the part of physics where experiments are cheap." - V.I. Arnold


Until you start multiplying infinities on your auto scaling AWS cluster


This is my take.

Science attempts to describe reality. Math attempts to create rules/axioms.

They're not the same pursuit, although they can often be useful together.


It does (especially using computers), it's just that experiment is not the sole criterion of truth as it is in, say, physics.


I think it's a good article.

But it's worth considering the possibility that mathematics could have fallen, and could still fall, into a state where false results are frequently published, and isn't 'protected' by anything special in the nature of the field or its practitioners.

Just as you might find yourself asking "why did city A fall into the grip of organised crime, and not city B?". You might look for answers in the methods of police recruitment or a strong history of respect for the rule of law or anything like that, but it might turn out that the answer is really just "city A got unlucky".


The social science and medical replication crisis seems like it would be far more impactful than a mathematics crisis, right? Politicians, policy-makers, doctors, etc. all make decisions based on potentially flawed or outright incorrect studies in a way that I don't think is true for the equivalents in math, simply because there aren't decisions and policies up for debate related to much of them (if I am wrong about this, please correct me).


If a flawed mathematical paper were used as the basis for what then became a flawed cryptography algorithm, I can see that having impact if the bad guys noticed the flaw first. But yes, I expect examples like that would be comparatively rare.


In cryptography the math is almost always the strongest part, and it is the side-channel attacks and implementation mistakes that let the bad guys in. When it is the math, the flaw is often that the algorithm has all the desirable properties proved in a number of papers, but has some exploitable structure that analysts can turn into an attack.


Thanks for this, cryptography is a good example where this could be a problem.


I think politicians generally don't make policy decisions based on science. I live in the UK and an obvious example is drug policy. They even fired the scientist whose drug research they didn't like.

But if the science agrees with a decision they've already made then they're happy to use if for justification, even if the science is junk (e.g. the crazy fines on taking children out of school in the UK).


Because math isn't science, so there is nothing to replicate. If the analysis is sound, that's it, there's no question of whether the physical results the analysis was applied to are fraudulent, measurement error, due to improper setup, etc.

Why is this even a question? Analytical and empirical fields face fundamentally dissimilar challenges.


One time when I was still a sophomore a friendly professor shared a story how one of his colleagues had a math paper accepted, and found an error in the proof only after it was published. So he had another paper in that journal next, with the refutation of his earlier work. And that feat was worth double the point score awarded to the institute by the scientific powers that be.

I'm sure the excuse is that errors in math papers these days don't have the same kind of impact as mistakes in, let's say, medicine.

Later on my guess was that this will change when we will have a editor software that is easy to use, wysiwyg, renders math texts as well as latex, and actually the edited document is integrated with the formal proof, beneath the fancy elegant phrases and formulas.

Then I quit a job and enrolled for phd studies but that's another story.



I wanted to post the same thing, so was looking for a top comment addressing it and glad to have found it. It's weird how many people, even on HN, do not get this distinction. Math gets thrown in with all the sciences but it itself is not a science - it doesn't have experiments, it doesn't follow the scientific method.

Mathematics cannot have a replication crisis because there is nothing to replicate. In math correctness is tested not by redoing something but by double-checking the original work for mistakes.


There is one corner of math that does have a replication crisis. Just as we compare programming languages by how "ergonomic" they are to learn and use, mathematicians do come up with novel notation systems to try to improve the ergonomic state of their field, and since "ergonomics" is another way to say "esthetics", and is proved or disproved by user testing, that is where replication gets hard.

The inventor of category theory's wiring diagrams, for example, has claimed that he could get middle schoolers to understand them. I suspect that success has not been replicated.


> The inventor of category theory's wiring diagrams, for example, has claimed that he could get middle schoolers to understand them. I suspect that success has not been replicated.

Maybe I should try. I like category theory and I am a middle school teacher and like teaching electives on improbable things :D


Replication crisis come down to funding. If you publish bad results in maths, you lose funding and get fired and humiliated. If you publish bad (but exciting or popular) results in social sciences, you get more funding and acclaim. All subjects are basically (sadly) products. Maths, Physics and Engineering are products that have to (mostly, eventually) actually work. Politics, Sociology, Gender Studies etc are more about entertainment. Som fields (eg nutrition) are a hybred of the two: some serious factual research, some fads and fashion.


Mathematics is fully deductive (so it is not strictly speaking a science - it is not empirical, but it's own kind of discipline, like logics, if you don't consider that part of mathematics), so replication can be attempted by machine, and as the authors wrote proof-checkers find more and more mistakes. A water-tight proof essentially is the trace or log showing a series of mechanical deductive steps.

But because journals publish human proofs, one of the following cases can arise:

- A proof can be wrong and the underlying conjecture can still be true (i.e. a theorem).

- A proof can be wrong and the conjecture can be false - these kinds of errors need to be corrected with utmost urgency, because they lead to follow-up downstream errors.

- A proof can be right and recognized as such by peers, in which case it gets published and everybody is happy.

- A proof can also be right and be contested by peers. This is when proofs have gaps that are "obvious" to some, but not believed by others. Because a proof is a sequence of steps that are either previously derived or self-evident, and people simply differ what they consider self-evident (as an aside, in school I was criticized in maths for "skipping steps" and in English for "jumping thoughts"; to me it seemed obvious where things were going but the teachers obviously needed "comments to the code", so I learned to insert baby steps, and everyone was happy).

Because of the last case, it is important to get down to the most formal, fine-grained, nitpicky, atomic level, so everyone can agree that each micro-step is self-evident, and thankfully we can dedicate this to machines nowadays (provers like the "Isabelle" system - https://www.cl.cam.ac.uk/research/hvg/Isabelle/), at least to an extent. Ironically, that's not how real proofs in mathematics journals look like at all. They're written in prose, and often written in a surprisingly "meta" style (we had algebra lectures where the professor talked about "colored roosters" and how they behave, which is the name of some structure in the algebraic sub-field of Ramsey theory).


My summary: in social science, measurements are coarse which means the statistics cannot evaluate subtle changes. In math, logic is very precise so subtle changes can be measured.

The author makes a point that both social science people and mathematicians are trying to prove subtle things which are actually probably true. In other words the social science replication crisis is because the experiments are often impossible to perform consistently when the effect is subtle, leading to the use of inconsistent lucky draws to demonstrate things.


why not turn the premise from the article around. Instead of suggesting that math might also have a replication crisis, why not question whether the whole replication crisis thing was overblown and effectively more of an ideological attack on a few disciplines.

Errors in science, disagreement and lack of reproducibility I think are common and prevalent but it doesn't necessarily imply that a discipline as a whole doesn't make progress. The obsession with statistical accuracy and 'science as bookkeeping' mentality seems fairly new to begin with, and science did just fine before we even had the means to verify every single thing ever published.

It kind if ignores the dynamic nature of science. Most of what is published probably has close to zero impact regardless of whether its right or wrong, but paradigm changing research generally asserts itself. Science is evolutionary in that sense, it's full of mistakes but stumbles towards correct solutions at uneven tempo. In a sense you can just look at it like VC investment. Nine times out of ten individual things don't work, but the sector overall works, the market economy is full of grifters and failed businesses, but it doesn't matter that much.

So, maybe half of math is bullshit but so is everything else but in math people just say "whatever" until they find something good, whereas in psychology people use it as an opportunity to hack away at it.


It kind if ignores the dynamic nature of science. Most of what is published probably has close to zero impact regardless of whether its right or wrong, but paradigm changing research generally asserts itself.

The problem is that psychology, unlike physics, doesn't really have a paradigm. There's the old saying, "Extraordinary claims require extraordinary evidence." But for that heuristic to be effective, you need a standard for what counts as extraordinary. Extraordinary compared to what? In phsyics, there are two well-established paradigms (relativity and quantum mechanics), which establish what counts as ordinary, and what counts as extraordinary. So, for example, if you're making a claim that the distribution of dark matter in the cosmos more clumpy than predicted by existing models, of that the energy level of a particular field is 12 MeV rather than 10, those are ordinary claims, which can be accomodated by tweaks to the existing paradigm. But if you're saying that the speed of light has varied over the history of the universe, or that all subatomic particles are actually tiny vibrating string-like structures, well, that's going to require a lot more evidence.

In psychology, it's much more difficult to have that kind of intuition. Take the concept of priming, for example. Is claiming that people walk more slowly when they're encouraged to think of things that make them feel old extraordinary? It makes a certain sort of intuitive sense, but, on the other hand, there's absolutely no causal mechanism suggested. So when a number of priming studies fail spectacularly under replication [1], I don't know what to think. I don't have a good sense for how much of psychology is overturned by the replication failure, in the same sense that I'd have for physics if it turned out that e.g. the speed of light is a variable rather than a constant.

[1]: https://mindhacks.com/2017/02/16/how-replicable-are-the-soci...


Psychology currently is just a pseudoscience, really. It is at the same stage the general medicine was before microscope was invented and bacteria discovered: they try to analyze some phenomenon, but they have not discovered the underlying principle yet. My bet is: the real advancement in treating mental disorders will come from AI/neural network researchers, not psychologists.


> In phsyics, there are two well-established paradigms (relativity and quantum mechanics), which establish what counts as ordinary, and what counts as extraordinary.

Actually, I would say that these well-established paradigms establish what counts as extraordinary. In other words, relativity and QM are examples of extraordinary claims that we believe because we have extraordinary evidence for them. Both of these theories say all kinds of extraordinary things, and most people who first encounter the theories start out thinking they can't possibly be true. We believe them not because they are just ordinary, but because we have taken the time and effort to accumulate extraordinary evidence for them.

In that light, the replication crisis in other areas of science is easily explained: they allow extraordinary claims to be published without the extraordinary evidence that those claims would require. So of course many of those claims turn out to be wrong.


The difference between the two examples you have it that venture capital and the market economy openly embrace their flaws, whereas science and academia refuses to acknowledge (or manage) the "humanness" of the system and projects a hyper-enlightened ideal both internally and to the outside world.


This isn't turning the problem around in the same sense. In order to effectively turn a problem around you need to use its complement, and then it should be a binary proposition. Your example where you suppose an idealogical attack fails here because it is not the only other explanation, you haven't turned the problem around in the same way that turning "what is the probability this action had an effect" around can become "what is the probability that this action had no effect".


Unrelated, but the replication crisis in medicine is the primary reason why anti-vaxxers have a right to be skeptical, even if they're wrong and spout nonsense. Now we *know* that a large percentage of cited medical research is fraught with uncertainty and is potentially non-replicable. Therefore, people have a right to be skeptical of "the science" even if they don't have a coherent counterargument. It is extremely reasonable to be suspicious of statistical methodologies in vaccine safety studies and adverse reaction reporting; there are statistical errors and even outright fraud all over the scientific landscape, so why not here? And given the extremely perverse financial and political incentives found in this particular pandemic environment, I would be extremely surprised if there was NO foul play happening, because you would have to be a hero to ignore these incentives.


when I was getting my PhD in chemistry, the replication crisis had not been recognized (widely anyway). as it has become more widely discussed I've vacillated between annoyed that people don't make the distinction between the hard/simplef sciences and the fields that study much more complicated, multi-casual topics and being self critical that perhaps chemistry also suffers from a widespread problem.

I still don't know the answer but I suspect there is something to the simple/complicated distinction.

I do recall that there was this one professor that couldn't replicate his seminal work that basically got him tenure and it was all very dramatic. the kind of story that we would share with each other as a byword.

Aslo, I wonder if the errors get caught more quickly because in chemistry and physics to a degree, most progress is built directly on previous results. so you can't have bad results just floating unnoticed for very long.


The thing is, mistakes will always be made and corrected. Maybe the experiment worked the first time because someone smoked in the room yesterday and a trace amount of contaminant made it in the solution and catalysed a reaction. Maybe your supplier had a minor contamination on your reactants. The reasons are endless.

The issues I have with the "replication crisis" are two: malicious publishing of false results for personal or organisational profit, and malicious interpretation of prestigious publications as the one and only immutable truth. Both of which are social and political problems.

Scientific publishing was functioning fine so far for the same reason the old internet was full of true information while it didn't have to: there was little to be gained by lying. Now that gov policies are build on phd papers, things are changing.


Just a fun fact: there are dozens of papers released each year, which claim that they have proven that P=NP, or P is not equal NP. E.g. they "prove" it with soap bubbles: https://arxiv.org/pdf/cs/0406056.pdf


Well to be honest, things on arxiv are not properly speaking “papers” in the academic sense. At least in Maths.


There are lots of cryptographical protocols with mathematical proofs that contain errors that break the security of the protocols themselves. A version MuSig in Bitcoin was a protocol like that, though luckily the mistake was caught by the cryptographic community when the authors wanted to publish the proof at a conference.


"A mathematician's fame is proportional to the number of incorrect proofs they have published."


because practical applications of such works are usually several decades away. A bad medical study can kill people today. There's not a lot of incentive to check nor complain about consequences far into the future.


Very interesting! As a social scientist, the replication crisis gets me seriously down, making me doubt the integrity of my field. It's good to see that there's not such a problem in maths...



"We get away with it becuase we can be right for the wrong reasons—we mostly only try to prove things that are basically true."

Apart from the replication crisis, the other crisis that is not really talked about is funding. Academic funding basically comes from one of 3 (connected) sources - government, corporations and the military. Somehow or other - these 3 sources have pretty much the same or non-conflicting aims. These aims relate to power and control. This is actually the largest crisis, IMO.

Given that information, we can re-assess the quoted statement the author makes.

Perhaps its not that they are proving things that are "basically true". Its that right or wrong do not matter. What does matter is that the answers provided meet the agenda of those funding the study. The answer is not that important as long as it is supportive of whatever agenda is in play. I believe this is the case for the replication crisis in science also.

A replication "crisis" is only a crisis if you are attempting to achieve truth and greater understanding. But truth and understanding are only ostensible reasons, not the actual ones. What these studies are actually doing is creating a parallel construction - the aim is actually for studies to appear 'truthey', without actually being so. What studies should actual do is increase the funder's power, wealth extraction abilities, etc.

If you doubt this and think that truth matters, consider this. Surely we should have cracked the best diet for people by now? But there is no common understanding of what is good or bad to eat - if anything there is more confusion. The reason of course, is that there's no money in recommending whole foods or whatever. However, there is money in drugs to make people 'better'. And money in making diet so confusing that people eat themselves into trouble.

Anyway, if you are in the business of governing or monetising the masses, truth and understanding is the last thing you want. Far better to have a story that gives you control, or extracts money. Such is life under fascist governance (where fascist = corporation + governance working together).


The replication crisis is not largely due to the math/stats used in studies.

It comes from trying to make inference from noisy and often confounded experiments from limited data.


I like this sentence from the link:

> Many papers have errors, yes—but our major results generally hold up, even when the intermediate steps are wrong!


Not being a formally trained mathematician or psychologist but being someone who’s seen plenty of organization/institutional mechanism design:

is it possible that e.g.psychology research might become more mathematically rigorous if there were more incentives for serious mathematics people to study psychology?

I appreciate at first glance that might seem a bit vacuous: but there’s a ton of money and way better policy at stake if we understood behavioral economics better, and that’s at least an adjacent field to psychology right?


A bit cynical take is that it is harder to mislead in theory? Your mistakes will get caught and you risk losing your reputation so it less likely that someone will publish a proof without double triple checking. Theory subject has faster verificarion loops.

Verification and review is harder in experimental works and it might take few decades before someone finds an error, or even collect resources to verify the claim. So maybe it's harder to fight 'publish or perish' deamons in experimental sciences?


How can there be errors in math papers if math formulae are trivial to check formally using a computer?


Math formulae are not trivial to check formally using a computer. Mathematical software is in its infancy relative to the sophistication of mathematical objects defined in the literature.


Because the current state of the art of automated proof checking is still trying to check all the proofs you'll see in a undergraduate program and nowhere near able to check state of the art proofs.


"All science is either physics, or stamp collecting." (Ernest Rutherford)


The article's linked article on social priming has a fun typo:

4.10 The “Lacy Macbeth Effect”


In mathematics there is no measurement uncertainty


Because math doesnt rely on experiments.


2+2=4


This gets at the heart of the difference between science and math. What is actually science? and what is actually math? You may think you know, but most people don't.

Did you know? That in Science, and therefore reality as we know it... NOTHING can be proven to be true. Proof is the domain of logic and mathematics, NOT science.

There is a huge difference between science and mathematics.

Math is an imaginary game involving logic. You first recursively assume logic is true then you assume some additional things are true called "axioms." You assume the entire universe only encapsulates Logic and the axioms as part of your theory and that is everything that exists in your theoretical universe. You derive or prove statements that you know must consequently be true in your made up universe based off of logic and the "axioms." These statements are called "theorems." That is math...

For example the "pythagorean theorem" is a statement made about geometry assuming that the axioms of geometry are true and logic is true.

So of course if it's a logical game, there's no real replication crisis. All theorems in mathematics are proven true based off of logic. You can't replicate conflicting results if a result is PROVEN to be true via pure logic.

Science is different. Nothing can ever be proven to be true. There are basically two axioms in science. We assume logic is true, just like in math. And we assume that the mathematical theory of probability is a true model for random events happening in the real world. Then we assume everything else about the universe is completely unknown. This is key.

Because the domain of the universe is unknown... at any point in time we can observe something that contradicts our initial hypothesis. ANY point in time. That means even the theory of gravity is forever open to disproof. This is the reason why NOTHING can be proven. To quote Einstein:

"No amount of experimentation can ever prove me right; a single experiment can prove me wrong."

So sure mathematicians can make mistakes... and I know the author is talking about higher level details.... but at its most fundamental level, assuming all other attributes are ideal... it is fundamentally impossible for math to have a replication crisis while science, on the other hand, is fundamentally forever open to these crisis so long as the universe is unknown.

The most interesting thing to me in all of this however is that within science, probability is a random axiom. We have no idea why probability works... it just does. It's the driver behind another fundamental concept: Entropy and the arrow of time. For some strange reason Logic in our universe exists side by side with probability as a fundamental property.


> All theorems in mathematics are proven true based off of logic. You can't replicate conflicting results if a result is PROVEN to be true via pure logic.

You can publish a paper in mathematics that claims to prove something, but is mistaken. A paper claiming that a theorem is proven, is not the same thing as the theorem being proven. However, that's not often the case—why that is, is an interesting & meaningful question.


Yeah I get that. What I'm saying is that outside of mathematics, a proof is fundamentally impossible. It can never happen.



Yeah, but nobody tries to prove anything that can't logically be proven. So not only is there no replicability crisis but such a thing can't exist in the first place.


People try all the time to prove things that can't logically be proven. Probably lifetimes spent on Euclid's parallel postulate alone - over 2000 years of attempts! Before Russell Frege thought his logic was consistent. Before Gödel it was believed we could formalize a complete arithmetic even if Frege's specific design failed. Before Church and Turing the Entscheidungsproblem was considered potentially solvable even if Gödel's work ruled out the resultant proofs wholly capturing powerful theories.

It took a very long time to even notice the axiom of choice, and longer to ultimately prove its proof futile.


What I'm saying is no papers are written about it unless the author made a mistake. In the absence of a genuine logical mistake, no replicability crisis is possible.


Because math is not a science.


More like because maths are an actual science.


Huh?!?

Please read about abc conjecture and whole saga wrt proof, Shinichi Mochizuki, Peter Scholze, etc, etc, etc


Did you read the article? This saga is explicitly mentioned and does not detract from the authors point IMO.


But it does! Author is just plain wrong.

Here there is piece of the very important math research which community failed to replicated in either positive way (confirm it) or negative way (reject it)


Yes but it's famous because it's the exception to the rule. It's definitely a failure to replicate, but that does not necessarily mean that math has a replication crises in the same way the social sciences do.


> It's definitely a failure to replicate, but that does not necessarily mean that math has a replication crises in the same way the social sciences do.

But it does! Look, in social sciences there are a lot of lousy papers written by people who doesn't know statistics, acceptance criteria etc. There are problems with "average" papers.

In math there are no problems with "average" papers, afaik. People are following those papers and rather quickly find flows or confirm the result.

In math there are problems with "exceptional" papers. Holes in Fermat theorem proof or abc paper were taken years to find, patch and settle.


What you mention here is not a "replication crisis", it's simply a settling of proofs that are at the advancing edge of the subject. Analysing recent papers is not what the "replication crisis" is about.

The "Replication Crisis" is about confirming long-standing results that are widely accepted, and yet when re-examined for the purpose of replication cannot be confirmed.


I fail to see the difference. "abc" was accepted - it was published in the peer-reviewed journal.

Mat his different from social studies, but (and here I disagree with OP) it has it's own share of replication problem(s)


and abc saga is still ongoing...


And everything (absolutely EVERYTHING) is open and published


Steven Pinker takes a decent crack at "Statistical Significance" in his new book Rationality (https://en.wikipedia.org/wiki/Rationality_(book)), which underpins a lot of this and is mentioned in this piece. And I'm still grappling with this part of the book. lol


Because it's largely not an experimental science, but rather is groups of people convincing themselves they know what someone is talking about.

As the article pretty much says.

The article also suggests an "exceptionalism" of maths in that their results hold up even when their methods don't. In other words math papers are full of "unimportant flaws and errors" but even though that's the case when corrected or verified, somehow the conclusions of the papers mostly hold up, in the author's opinion.

But I tend to think this is more because, math is groups of people (and formal verification systems) convincing themselves they know what someone else is talking about, so confirmation bias is probably in effect.

The limitations of a self-consistent set of axioms after all...

You may say that formal verification systems are perfect oracles... But if papers have "unimportant flaws and errors" it's possible that those systems have them, too, right? It's also possible that as those systems were written by people, they reflect and somehow seek to confirm the biases of those people or their understandings (or misunderstandings) about mathematical knowledge.

So I don't think maths is quite as exceptional as this author hopes it might. Tho Maybe it is




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: