Why isn't there a replication crisis in math?

scoofy · on Feb 2, 2022

The paper here seems to make absolutely zero distinction between deductive and inductive reasoning, which should be the entire point.

Math is an arbitrary framework, built from arbitrary axioms. It is deductive. Thus, all proofs are simply deduction. The knowledge here is positive knowledge, we can show things are true, false, or undecidable. There may be errors, but those are errors in execution.

Psychology is not built on a framework. It is inductive, we are trying to find axioms that map to the data we collect. Thus, all papers are trying to add/build it's arbitrary framework. The only knowledge here is negative knowledge, falsification, we know only what is a failed hypothesis. There will be errors, both in execution, and there will also be statistical errors in experimental results.

The entire point of the replication crisis is that we don't publish or pay attention to results that are boring, so the framework we build is built on skewed data. We don't reject previously popular papers that are now unfalsifiable (the idea that the now unfalsifiable Milgram experiment is still taught in every university psychology dept should be outrageous). The boring results need to be weight statistically against the interesting results, but aren't, etc. Nobody out there is arguing whether or not the axiom of choice is a true axiom? It sort of doesn't matter, it can't matter, because it's arbitrary by definition.

You can't have a replication crisis inside of a deductive framework without changing the framework. This doesn't happen too often, but we did see this during the shift from neutonian to einsteinian physics. The study of philosophy of science is fairly obscure, but is the center of this discussion.

jcranberry · on Feb 2, 2022

I think you're wrong on some of these points, and the author was trying to make these points but was not explicit.

I remember reading an article in the intelligencer by a mathematician who essentially said there are certain conjectures where if they were to be proved false rather than be thrown into a sea of uncertainty, mathematicians i would quickly move to investigate a readjustment of basic axioms rather than accept that those conjectures are incorrect.

Then there are fields of mathematics around selecting different axioms. Investigating the ramifications of whether you take the undecidable "continuum hypothesis" as true or false. And then theres model theory and such. Presumably they study models of interest and not arbitrary ones.

You're mostly correct that the methodology is mostly deductive but the point is that what we choose to use isn't arbitrary because there are things in math which are more important than axioms as they are things believed to be "real".

scoofy · on Feb 2, 2022

I don’t mean "arbitrary" in the sense that it doesn’t matter to the engineering of a building. I mean "arbitrary" in the sense that unprovableness within the framework. Axioms are by definition arbitrary in that sense.

edit: I'll try to expand on this. Math is weird because it's a sort-of in-framework study. We use math when we do physics, and we use physics when we do engineering.

If the physical world ever started disagreeing with our physics, then we need new physics. If the math ever started showing inconsistent results in our physics, then we would need new math.

None of that is to say that Neutonian Physics, say, is "wrong in the sense that it's internally inconsistent", only that "it's wrong in the sense that it is not an accurate mapping of a framework to the world we find ourselves in." The first type of wrongness is the type of wrongness we concern ourselves with in deductive studies, the second type, the mapping-error, is inductive, and requires meta-analysis, and is prone to (Hume's) problem of induction, which means it's ultimately unknowable (i.e. Popper).

cormacrelf · on Feb 3, 2022

Here is where you're wrong:

> You can't have a replication crisis inside of a deductive framework without changing the framework.

You may change the framework, but the replication crisis would be that people don't notice it's changed.

A proof can get so huge and complex that it would take a lifetime of study to understand it. One of the article's good points was that "you can replicate a math paper by reading it", but if you cannot read a proof, you cannot replicate it. If nobody can understand it, nobody can replicate it. There would be a big crisis if people started trusting huge papers blindly and using their results without stating them as assumptions. Mathematicians are largely not doing that, so there isn't really a crisis. But they might! And you would not notice that the framework had become inconsistent for a while. Therein would lie the crisis.

It is nice to think that as soon as the framework changed, people would notice, because surely if it the ground moved, that would be obvious! But it is not true. Nothing about the deductive-ness or in-framework-ness of mathematics changes this. It is the same for psychology, the replication crisis is a crisis because people do not notice these results are wrong, and then rely on them in clinical guidelines and affect people's lives for generations.

It is possible to stave off replication crises by using a machine, because mathematics is self-contained and you can use a computer to show that you proved something using only these specific assumptions and they can believe you without reading and understanding it fully. But this doesn't mean mathematics cannot ever have a replication crisis.

I think your arguments here are best summarised as confusing "replication crisis" with "any published results being inconsistent with previous results". Your idea of "inconsistent" is also strange, the article considers this to mean mistakes in proofs, whereas you mean new mathematics that changes the field by discovery. You're just talking about something completely foreign from the concept of a replication crisis.

somenewaccount1 · on Feb 3, 2022

Here's where you are wrong > But they might! And you would not notice that the framework had become inconsistent for a while.

As the author stated, the math can be reproduced by reading it: ergo, thee is no barrier to entry except knowledge. The other sciences do not have this luxury - they require access to specialized instruments, participants, funding, and most importantly - time. Math can be checked nearly instantaneously, so the "might" becomes inconsequential my small.

kuhewa · on Feb 3, 2022

> Math can be checked nearly instantaneously, so the "might" becomes inconsequential my small.

Can it though? Pickovers 2013 edge response comes to mind, can hundreds of pages of a proof based on axioms that require particular expertise to understand be checked instantaneously? I think it would easier to replicate the typical psychology experiment than to check the math in that case.

https://www.edge.org/response-detail/23670

somenewaccount1 · on Feb 4, 2022

A few edge case examples do not constitute a crisis. The point is not that 0 cases happen in mathematics, but rather that unreproducible results are not the norm, while in other sciences, it is the norm.

kuhewa · on Feb 4, 2022

You might be mistaking me for someone who stated there is a math replication crisis. I replied to one specific claim

FeepingCreature · on Feb 3, 2022

Also, math is universal. A study can be true in one place and false in another; a proof cannot. This is another barrier to replication.

jcranberry · on Feb 3, 2022

Math cannot be checked instantaneously. Mochizuki's proof for the abc conjecture mentioned in the article was first released in 2012, and despite conferences and lots of time spent trying to figure out what he's saying and whether it's correct there is still no consensus.

bryan0 · on Feb 3, 2022

As an aside, as I understand it there is general consensus now that that the proof is flawed: https://www.math.columbia.edu/~woit/wordpress/?p=12220

jcranberry · on Feb 4, 2022

Thanks the post. This is the first time I've read an article by a math professor in such an acerbic and relatively handwoven way. Seems like over the years this proof must have generated passionate disagreement.

somenewaccount1 · on Feb 4, 2022

Right, but that's a particularly rare example in mathematics, not the norm.

IggleSniggle · on Feb 3, 2022

> no barrier to entry except knowledge

That’s fundamentally it, there’s a barrier to entry, and it’s knowledge. Other sciences may require specialized instruments and time, but math is also subject to the problems of knowledge not being instantaneously transferable. And I imagine, although I’m not certain how prevalent it is, that many branches of mathematics are also dependent on instrumentation these days, in the form of software.

And we all know how software can be…

inglor_cz · on Feb 3, 2022

This scenario was actually pretty close when Andrew Wiles published his proof of the Last Fermat Theorem. The proof was huge, IIRC over 100 pages of maths, some of it pretty obscure, and verifying it took a lot of work of several people.

They even found a flaw in it, but Wiles was able to fix it after a year.

benibela · on Feb 3, 2022

Or now the "proof" of the abc conjecture with the inter-universal Teichmüller theory

scoofy · on Feb 3, 2022

I’m talking about an event like the incompleteness theorem that ends aspirations like those of Russell and Whitehead. Paradoxes that lead to new axioms like in the development of set theory.

Deductive frameworks can’t be wrong in the way that experimental results can be statistical anomalies. The author conflates errors with perfectly reasonable experimental results later shown to be anomalous upon replication.

cormacrelf · on Feb 3, 2022

Again you are talking about the framework as if it is exactly the same thing as the papers that attempt to describe it. It is not. Your statement that "deductive frameworks can't be wrong" is irrelevant to the question of whether a math paper can be wrong. If I prove something, it can be wrong, and it may turn out that I haven't proved it at all, but I can still call it a proof, convince other people it's correct, get it published in a journal, and have other mathematicians rely on it before the mistake is discovered. Papers can be flawed. The papers are not the framework. The reason you think they are the same thing is probably that mathematics has not had that many big mistakes recently, since the renewed effort on solid foundations last century, which means the system is working.

You can't define your way out of the fact that mathematicians will make mistakes and write papers that draw conclusions that are not valid. They are just pretty good at discovering the mistakes, and finding them is culturally essential to the practice of mathematics; arguably if it were not, it would be a big game of Numberwang. Psychologists are, on the other hand, not nearly as good at discovering mistakes. We didn't have to use any properties of the self-containedness of the disciplines to get to that result. It was not necessary to prove mathematically that mathematics does not or cannot have a replication problem. You can just observe it.

(Edit, all of that in brief: if it were truly impossible to be wrong in maths, then why do mathematicians spend so much time trying to verify each others' proofs? Are they wasting their time? Or is it the entire reason there isn't a replication problem?)

openfuture · on Feb 3, 2022

I think this discussion is a bit off, moving the root of trust from the mathematicians to the computers is the thing that will not only kill mathematics but also our civilization (and I will try to argue this overstatement now).

People have a hard time understanding things on extreme ends of the scale because they just assume everything is within some reasonable bounds of effort, this is a failure of imagination.

Mathematics is not a reasonable amount of effort.

What's more it is depended on by literally everything else we do (as a civilization).

Theorem proving software (so far) makes the fatal mistake of just trusting; the machine it is running on, the operating system it is running in, and several other things that can subvert assumptions. The correctness of a proof is justified by the trust we place in the heavily audited kernel which will check the proof script outputted by compilers for the human (textual) interface but that kernel is just another program on your (mostly) proprietary platform.

Mathematics is a religion that currently dominates the planet (and as with most religions, it is totally ambient to people in that they don't realize to what extent they base their behaviour on it) but it is not without competition and the thing we must realize is that this competition is as motivated, ruthless and mission-driven as anything else on the planet.

Just as we have people sitting around learning to recite the qur'an or the bible from memory we also have universities full of mathematics students reciting proofs and theorems by memory, trying to keep the knowledge that brought us this far alive.

Computer memory is inherently volatile and we do not have the same conventions and procedures in place as we do for safeguarding the legitimacy of written texts. These take time to develop and are usually motivated by tragedies (such as the loss of our "prehistory").

Try to imagine the amount of effort it would take to re-assess every piece of information that we have (at current time) because we accidentally let the entropy into our libraries.

ConceptJunkie · on Feb 3, 2022

You seem to believe that computer software and hardware has no error detection and correction built into it. This is definitely not true.

The reproducibility of computer-derived knowledge is what makes it something we can trust. Especially if it's being reproduced on disparate hardware and disparate software.

Mathematics is possibly the only thing in the entire world that can _not_ be described as a religion.

aspenmayer · on Feb 4, 2022

> Mathematics is possibly the only thing in the entire world that can _not_ be described as a religion.

Math works, whether you believe in it or not.

JetAlone · on Feb 3, 2022

The results of the Milgram and Stanford Prison Experiments (the amount of harm caused by studying obedience and harm to humans) were instrumental in giving us the very ethics codes that make them unfalsifiable. That makes them an important facet of the field of psychology's history and worth teaching for that alone. If you believe that ethics code is appropriate, then you believe those studies are valid and don't need to be run again to re-verified or falsified. If you really don't trust the ethics code which was produced in reaction to the results of these 20th century experiments, then the authentic thing to do would be to re-run the experiments clandestinely and disseminate the results anonymously.

scoofy · on Feb 3, 2022

The ethical codes are extremely important, that's not my point.

I'm saying we are taught the results are correct, when, due to the impossibility of reproduction we can't know whether the results are correct or incorrect.

What should be discussed in the Milgram experiment is "we can't know the results of the experiment because one of the subjects killed themselves due to their participation. Try to consider the ethical implications, don't design experiments that might lead to psychological trauma, etc." The idea that the results being true is why we shouldn't do the experiment is wrong to do, is nonsense, because we can't know whether the results are true because can't replicate the experiment.

Yet, it's results are published as a pop-science book: https://en.wikipedia.org/wiki/Obedience_to_Authority:_An_Exp...

Instead, these two studies if their findings, not the ethics, are discussed at all, are a perfect illustration of the problems that Karl Popper discusses regarding falsifiability.

JetAlone · on Feb 3, 2022

I think one of the subjects killed themselves due to participation because of the deep inner tension caused by our struggles with obedience versus conscience. The results of the studies bear that out - people largely obey authority unto causing serious harm, that's what the studies tell us, and it's not an inordinate stretch to say that this dissonance causes a person to feel an urge to commit suicide such that, if few people obeyed authority giving malicious orders, they would feel little dissonance and would be way less likely to off themselves. And that's the connection between the results and the ethics. It may not be awfully scientific of me, but I'd yell you out of the quietest room on earth for trying to claim the results have no bearing on that.

scoofy · on Feb 3, 2022

"We can't know the results because we can't repeat the experiment," is exactly what I'd yell right back. Repeatability is required for falsifiability, and falsifiablity is required for empirical knowledge, period.

robotresearcher · on Feb 3, 2022

Repeatable and repeated are not the same thing. The Milgram experiment is repeatable in principle. It is falsifiable. The reasons to doubt the claimed results are not because we can't do it again, but because of the details of how it was done.

We also should not do it again.

PeterisP · on Feb 3, 2022

It's like saying that we can't know whether parachutes reduce injuries when falling from airplanes because we've never had a proper experiment to verify this and most likely won't ever run one.

We can know all kinds of things with a reasonable degree of certainty even if we can't or won't verify them experimentally.

shakna · on Feb 3, 2022

We have, in fact, thrown human analogues out of planes with and without parachutes. There are many ways to experimentally verify things in ways that don't require the use of an actual human - and in fact the development of accurate analogues is its own active area of research so that we can continue to expand experimental research into new areas.

caf · on Feb 3, 2022

This just seems to push the results down one level - how did we do the experiments to determine that the analogues are accurate?

shakna · on Feb 3, 2022

Some historical studies, some modern studies, on human bodies. By combining the data we have on things like nerve conduction studies, muscular studies, in the living and then experiments on cadavers donated to science, and so on and so forth.

scoofy · on Feb 3, 2022

The why were these experiments done in the first place? I certainly take your point and know the paper you're talking about.

Is it obvious that humans will generally shock people until they are told not to? Is it obvious that randomly assigned students will take on the roles that they are given?

I don't think it's obvious. I think that is why the studies were done in the first place, and I am generally quite skeptical of the results. There is no way to verify them, thus teaching them is genuinely bad science.

PeterisP · on Feb 3, 2022

> Is it obvious that humans will generally shock people until they are told not to? Is it obvious that randomly assigned students will take on the roles that they are given?

It's certainly not obvious, so verification is informative, and that likely was the motivation for doing the experiments. However, despite their flaws - they were limited and biased in various ways - it certainly would be far, far worse science to base our teaching solely on one's assumptions/scepticism/opinion about how things should be, instead of taking into account whatever limited data these experiments provided. Sure, it would be better to have more and better data, but since we won't get it, this does provide relevant information.

kuhewa · on Feb 3, 2022

Agreed, and they aren't isolated in terms of seminal studies that probably won't be replicated. No ethics board is going to allow us to strap down a seal and put it's head underwater, but that's how we learned a lot about mammalian diving physiology.

hnaccount_rng · on Feb 3, 2022

The cynic in me answers this with: "Because they got professorships out of it". But then I'm quite cynical

charcircuit · on Feb 3, 2022

>because we've never had a proper experiment to verify this and most likely won't ever run one.

Isn't it trivial to drop something out of an airplane along with the same thing but with a parachute and measure the impact velocity?

jstanley · on Feb 3, 2022

But that doesn't prove anything about how effective they are at preventing injury.

charcircuit · on Feb 3, 2022

I'm not sure what you mean. We can prove that a parachute allows people to survive jumping out of an airplane with limited injuries. They are effective.

jstanley · on Feb 3, 2022

For example, the fact that the impact velocity of some random object is reduced doesn't mean it is reduced from a harmful range to a non-harmful range. It might be equally harmful or equally harmless either way, or there might be additional factors at play in the case of human parachutists that we haven't thought about yet.

We don't have enough data on people jumping out of planes without parachutes to conclude that the parachutes are actually working. They could just be superstition.

goatlover · on Feb 3, 2022

That's absurd. The mechanics of gravity, parachutes and jumping out of planes is well understood. Plenty of people have fallen from heights without parachutes, and we know what happens to them and why (acceleration of gravity and the force impact on their bodies). We also know how parachutes, used properly, mostly prevent that sort of harm (slow people down enough to prevent excess force when impacting the ground). There's nothing superstitious about it.

PeterisP · on Feb 3, 2022

Yes, and in a similar manner we can know quite a lot about how people would behave in situations like the one in Milgram's experiment even if we won't ever do an exact replication.

hnaccount_rng · on Feb 3, 2022

Yes, BUT in similar situations people behave very differently than the ones in Milgram's experiment. This is the whole point of it being not replicable.

That this remains true is due to a) no one being allowed to repeat Milgram's experiment exactly and b) more general scientific funding bodies not allocating money for pure replications.

caf · on Feb 3, 2022

Sounds like you have a solid hypothesis there with a convincing proposed method of action.

goatlover · on Feb 3, 2022

Are you saying parachutes have never been reliably tested?

scoofy · on Feb 3, 2022

The paper they're referring to is a quasi-satirical piece about double-blind testing. The idea is that if double-blind studies are required for knowledge about medicine, then we have no evidence that parachutes have any affect on saving the lives of people falling from an airplane because it's unethical to give people in a testing environment a placebo, because they would die.

thejackgoode · on Feb 3, 2022

One aspect that is special to social sciences and not discussed enough, (and what was partially mentioned in this thread) is that in this case science is built on moving grounds. Viral papers influence society, society itself changes due to other factors. Metrics become goals. And that's why they have a time constraint for the period when falsification is possible. This continuous validity constraint is unknown, but it exists for every social phenomenon.

scoofy · on Feb 3, 2022

I never thought about that. Like soros’ market reflexivity, but for science.

Makes a lot of sense.

thejackgoode · on Feb 3, 2022

Yes, economic behavior is a good example of this. There are also countless social strategies that are in constant competition. When you provide "objective" proof of some behavior being beneficial, it becomes a competitive advantage for others to have, a "life hack" that you should adopt. And this influx of alien adopters often corrupts it, rendering less useful, skewing the study results finally. Meditation and mindfulness practice in general is a good recent example of such thing happening.

JetAlone · on Feb 3, 2022

Then run the experiments again in secret. Find people who are so passionate about your epistemological model they are willing to take the risk of renegade experiments on themselves. That would be an actual adventure, much better than roleplaying a yelling match at each other in comments sections.

EDIT: Hell, pretend play is for weaklings, I'll be your confederate jumbo-dumbo test taker faker, and let's go a step beyond the original experiment, let's have me actually suffer serious shocks in our Milgram Repro, for your n >= 30 guineau pigs to see. Then we'll know even better and that can be the new standard. And I'll be your confederate convict prime, set aside in especially horrible solitary confinement conditions for the Stanford Prison Repro. And when you watch me and others actually suffer horribly for days on end due to your decisions, I'm sure you'll simply have zero suicidal ideation, and you'll repeat your "we can't know!!!11!!1" drivel.

kinjba11 · on Feb 3, 2022

The Stanford Prison experiment was theater. Milgram was only slightly better. Never mind being unethical, they are just poor science. This is the core of the replication crisis in psychology: theatrical, grandiose claims get all the attention and people leap through hoops to defend them.

https://www.theatlantic.com/health/archive/2015/01/rethinkin...

https://www.vox.com/2018/6/13/17449118/stanford-prison-exper...

ConceptJunkie · on Feb 3, 2022

If you ask me, all of civilization is a continuous exercise in repeating the Milgram experiment in more and more extreme ways.

zozbot234 · on Feb 3, 2022

> If you believe that ethics code is appropriate, then you believe those studies are valid and don't need to be run again to re-verified or falsified.

Why? The experiments would still be unethical, even if they led to the opposite result.

jstanley · on Feb 3, 2022

Why?

If the opposite result is "some people are prisoners, some are guards, and they all sit around and have a jolly old time" then what's unethical about running that experiment?

OrderlyTiamat · on Feb 3, 2022

> If you believe that ethics code is appropriate, then you believe those studies are valid

But why? The standord experiment was performative, it wasn't even an actual experiment, as shown from the experiment notes. How did you come to believe that it is mandatory to believe that study is valid?

lolinder · on Feb 3, 2022

> If you believe that ethics code is appropriate, then you believe those studies are valid and don't need to be run again to re-verified or falsified.

This would only be true if the hypothesis being tested in these experiments was "psychology as a field needs a code of ethics". That wasn't either hypothesis.

"We need a code of ethics" does not imply "Milgram and Stanford prison proved ___ about authority figures".

stavros · on Feb 2, 2022

Exactly, I can't understand what the author was thinking. If a paper shows that 2+2=4, what would the replication problem be? You write the paper out again and 2+2 turns out to equal 5?

"The results can't be replicated" is different from "the logic here is wrong". So different that this article starts from an entirely invalid premise.

bo1024 · on Feb 2, 2022

From the post:

> More seriously, it’s reasonably well-known among mathematicians that published math papers are full of errors.[1]

I think the author's point is that actually the distinction is less clear: many math papers can't be (easily) replicated, yet people in math aren't too worried.

[1] https://twitter.com/benskuhn/status/1419281164951556097

stavros · on Feb 2, 2022

But there's a difference between "can't get the results a second time" and "didn't even get the results the first time".

There isn't even so much a notion of "results", just "logically, Y follows from X". If that's wrong, you don't need to run the experiment again (there is no experiment to run), the logic is just flawed.

bo1024 · on Feb 3, 2022

I don’t think it’s quite that simple. A math paper generally says 3 kinds of things: X is true, this is the high level idea of how we can prove X is true, and this is the detailed proof that X is true. As the post mentions, usually when there’s a mistake, X is still true.

ConceptJunkie · on Feb 3, 2022

Well, when there's a mistake found in a proof, the mistake needs to be rectified before the proof can be accepted. There have certainly been proofs that X is true, where X has turned out not to be true based on the fact that there's a mistake in the logic.

While it's possible a mistake won't be found, it's a much different problem than running empirical experiments and getting different results.

topaz0 · on Feb 3, 2022

Right -- you need to run the argument again.

vlovich123 · on Feb 3, 2022

I agree with you. Don’t math proofs build on each other on the assumption that previous results are true? So if you have an incorrect conclusion, you might end up building an invalid map of related results until you find it conflicts (or your a superstar that triple checks the meaningful dependencies you think might be particularly not well scrutinized). In spirit, that’s not unlike scientific lines of inquiry in other disciplines that compound on the errors of previous results (aka replication crisis).

It’s obviously quantitatively different in very important and meaningful ways, but the author I think is drawing an apt analogy.

rocqua · on Feb 3, 2022

It doesn't matter much beyond semantics. Math papers aren't experiments so they can't replicate.

However, math papers can still be wrong! Especially the actual proofs can be wrong. The question is, how many wrong proofs are out there, and how many of them have false conclusions.

If we accept that most published proofs have errors, why isn't that a horrifying failure of Mathematics that shakes fundamental trust in its systems. That is what the article is about.

This hypothetical shaking of fundamental trust would be highly analogous to the replication crisis. And there are more parallel arguments. So to me the premise makes sense, and I see little problem with some semantic wiggle room for sake of analogy.

walleeee · on Feb 3, 2022

Why should the Milgram authority experiment not be taught? If nothing else it's suggestive and might inspire new studies. The fact that the original design wouldn't pass a modern IRB doesn't mean the phenomenon is forever closed to investigation

scoofy · on Feb 3, 2022

Yes, of course, the Milgram experiment can be taught as a tragedy and a lesson in experimental ethics. But it's "correct results" are almost always taught, even being published as a pop-science book, when they cannot be know because the experiment is too ethically problematic to be replicated in a modern setting.

Ironically, the "correct results" are often cited as the reason for the tragic suicide of one of the participants.

walleeee · on Feb 3, 2022

I'm just having trouble with "almost always"

Any psychologist worth their salt knows better than to claim any interpretation of experimental results as unassailable

Pop science is another story (literally) but the profs I've had the privilege of studying under spent a lot of time warning us against the shoddy reasoning you describe

Psychology has a lot of bullshit in its orbit but the discipline as taught in quality departments is just that, disciplined

scoofy · on Feb 3, 2022

I don't mean to impugn the study of psychology in the slightest. Only that i find it maddening that obviously unfalsifiable results are taught as though they can teach us anything about human behavior.

I hope you are right and I am wrong, I was taught these experiments this way at a good school, i can only hope that is more uncommon than i think it is.

walleeee · on Feb 3, 2022

Fair enough- and agreed. I hope so too.

sebws · on Feb 3, 2022

I couldn't find anything about a participant's suicide when searching, could you provide a link?

scoofy · on Feb 3, 2022

I remember that from college but maybe I’m misremembering or maybe I’m simply wrong.

After a brief look around, I also couldn’t find anything about harm to the participants.

skywal_l · on Feb 3, 2022

Nobody was harmed in the Milgram experiment. That's the whole point of the experiment. It is to fake harm. The one seemingly receiving the electric shock is an actor faking pain.

https://en.wikipedia.org/wiki/Milgram_experiment

scoofy · on Feb 3, 2022

I’m talking about the people who believed they were shocking the actor. That experience could easily create mental trauma.

bawolff · on Feb 3, 2022

If you replace math with (experimental) physics, you get an inductive framework, yet physics doesn't have the same problems psychology has.

majormajor · on Feb 3, 2022

Does the behavior of a particle depend in part on every other experience the particle has had in the past? Physics has things that we can't figure out, and things that have to be observed in the aggregate, but it's much easier to perform multiple experiments with the same base conditions.

If we could clone the participants of a psychology experiment prior to the experiment, with their memories and personalities intact, we could possibly narrow the gap. Or maybe not! But that would be a fascinating finding! ;)

peteradio · on Feb 3, 2022

Physics has calculable observation biases baked in. At least you can create experiments where you knowingly limit those biases based on some serious fundamentals. All other fields have so much more to worry about, it really isn't fair.

brummm · on Feb 3, 2022

I think one of the major reasons is that physicists actually understand math and statistics and from the first year of university learn how to design, run and analyze experiments properly and quantify uncertainty.

This is a big difference to humanities or even biology.

tonyarkles · on Feb 3, 2022

See https://en.wikipedia.org/wiki/Oil_drop_experiment#Fraud_alle... and the following section. Physics isn’t immune to it.

peteradio · on Feb 3, 2022

The fraud allegations are bogus (according to wikipedia). "Millikan's experiment as an example of psychological effects in scientific methodology" is in itself a psychological study. Has this wikipedia source been peer reviewed and replicated? This is quibbling about fractions of a percent of the secondary measurement, the primary result being quite obvious quantization of electric charge. And! Ultimately the charge measurement result converged on the result known today within 10 years of his original experiment. That's what immunity looks like.

PeterisP · on Feb 3, 2022

The replication crisis is not at all about some papers not being replicable or being fraudulent - the crisis is about the issue that in certain domains a huge proportion of the papers (e.g. half!) turn out not replicable and actually false.

I'd consider it appropriate to say that physics is indeed immune to replication crisis unless it turns out that a significant proportion (e.g. more than 5%, one order of magnitude less than for psychology) physics papers fail replication.

inimino · on Feb 3, 2022

> the now unfalsifiable Milgram experiment

Total nonsense. Experiments are never falsifiable, what does that even mean? Hypotheses are falsifiable. Did any hypothesis of Milgram's suddenly become unfalsifiable one day in the last 50 years because the psychology research community found a new moral compass? Did the foundations of what is knowable by science shift during the night? "Now unfalsifiable" implies change. So what do you think changed, exactly?

You seem really hung up on a strict Popperian falsificationism that you have misunderstood.

In any case, the article mentions that you replicate a proof by reading it and checking the steps for yourself, so I feel your criticism about deductive reasoning is misplaced. It's implied that soft science papers require experiment to reproduce because they are inductive, just because the author doesn't spell this out for you doesn't mean they missed the distinction...

tomerv · on Feb 3, 2022

> Did the foundations of what is knowable by science shift during the night?

Yes! The moment the approval committee decides that certain experiments are no longer allowed, that makes some results no longer reachable by science, or at least not in a reasonable manner.

inimino · on Feb 4, 2022

How far "science" has fallen, if this is how people now define the word.

cubefox · on Feb 2, 2022

This doesn't make sense. This is actually addressed in the piece. If inductive reasoning is irrelevant for mathematics, why do then mathematicians have excellent intuitions about what is true even before they manage to prove it? The quote from the essay:

"[D]espite the fact that error-correction is really hard, publishing actually false results was quite rare because “people’s intuition about what’s true is mysteriously really good.” Because we mostly only try to prove true things, our conclusions are right even when our proofs are wrong."

Deduction can't explain this surprising reliability of mathematical intuition, only induction can. The question the author tries to answer is why the intuitions of mathematicians seem to be more reliable than those of social scientists.

scoofy · on Feb 3, 2022

I mean, again, this is a very messy thing to talk about explicitly because it's playing fast and loose with language.

I think the fact that our brains are computing/pattern-recognition machines is the obvious reason computing (deducing) is 'easier'. However, human minds are not actually mysteriously really good at math. Our brains are very clearly broken at some types of mathematical problems. It's just something acedemics tell themselves. When it comes to gambling, large numbers, power laws, etc.

The number of times I've had to explain to my friends and family how exponential growth "will appear" with the spread of covid is testament to the fact that our brains don't actually have that math built in, probably because it wasn't particularly important for evolutionary survival and reproduction. The entire field of behavior economics studies this delta specifically.

I'd say the deductiveness of a problem, itself, is what makes people have the intuitions. It's vastly easier to use deduction that it is to do induction... because induction is by ultimately unknowable and effectively solipsistic.

walleeee · on Feb 3, 2022

Humans in general may be prone to all sorts of erroneous mathematical reasoning but the speed and consistency with which mathematical concepts reproduce in the minds of mathematicians can justifiably be called mysterious imo, at least until we better understand human language faculties

kalimanzaro · on Feb 3, 2022

By your dichotomy, Physics seems closer to Psychology than math. Why aren't there replication crises in Physics? My answer is that there always has been-- it's just that they get resolved quick. The problem with Psych is that it is SLOW

scoofy · on Feb 3, 2022

There are replication problems in physics: http://astrobiology.com/2020/10/no-phosphine-in-the-atmosphe...

The entire point of the replication crisis is that we should expect statistical anomalies to occur, and the problem is that anomalous results are wildly more likely to be published. Which is why you probably hear about evidence of life on Venus but not about calibration issues.

It’s just generally easier and more problematic to do experiments testing framework boundaries in sciences with more variables like medicine, or anything to do with behavior.

ConceptJunkie · on Feb 3, 2022

I would assume that with physics, unlike psychology, or anything that involves biology, is that it's generally much easier to eliminate variables.

Testing a new drug, for instance, is very difficult because there are about a billion chemical processes going on in the human body (or whatever body is being tested), and trying to distill the effects of adding one more, and making sure that the effects are correctly attributable to the right causes is something we can almost certainly never know 100%.

This kind of thing certainly happens in physics, and we can see results like neutrinos appearing to travel faster than light until they are corrected, but it's usually much easier to isolate the processes of interest in a physics experiment than in a biological system, or at least easier to eliminate most extraneous effects.

raverbashing · on Feb 3, 2022

Because it is "trivially" easier to do a better experiment in Physics (money aside, of course)

Psychology will always have ethic, sample size and cultural/time/space constraints. Your "psychology experiment" done with Mechanical Turk is more an exercise on sampling bias than anything else

And they keep running into "replication failures" because they fail to account for that and try to chase the illusion that it is possible to have a perfect controlled environment in psych. They can't, and they're only fooling themselves they can.

gpt5 · on Feb 3, 2022

No - both you and the OP miss the real problem.

The problem with Psych is that you can't really experiment on humans in a meaningful way. Animal studies for example don't suffer the same replication problems as psych.

bildung · on Feb 3, 2022

> The problem with Psych is that you can't really experiment on humans in a meaningful way

While absolutely valid this isn't the only problem: One other complication is that conclusions from experimental studies often don't apply to the real world, i.e. people behave differently in experimental settings. Thus in any case you can either have good control over confounding varables but very limited scope (experimental studies) or bad control over the setting but the ability to observe real live behavior (field studies).

But it gets worse: Two people with identical behaviour can have quite different motivations and thus identical observable behaviour can result in quite different consequences. I.E. 5 hours a day playing games can be both healthy behavour or an addiction/a means of suppressing thoughts or memories, depending on the person. So you'll always have that massive confounder called the mind, in every setting. Which means you have to inspect the mind itself, which only works indirectly, i.e. by interacting with the person, asking questions, trying to find out the motivation behind a particular action. And the person often doesn't really know, because a particular decision probably wasn't chosen consciously.

3pt14159 · on Feb 3, 2022

I've literally been experimented on and it was meaningful. The problem, in my view is low sample sizes, poor selection[0], hyper dimensional confounding data, and low rates of application. With math, comp sci, or physics reality hits you in the face when you start to actually use it to build rockets or what have you.

[0] Guess when I was experimented on? Of course it was during university and of course I'm a white english speaker that lives in North America.

kalimanzaro · on Feb 3, 2022

Well perhaps if animal studies were primarily concerned with investigating the interaction of reason and emotion in animals there might just well be :)

topaz0 · on Feb 3, 2022

You may not have read very far into the post, because I think it deals quite well with these issues, if not in the same terms.

veilrap · on Feb 5, 2022

If you're actually interested in this topic, you should read "What is this thing called science?" by Alan Chalmers. One big take away from the book is that there are many things that are "factual" and "scientific" but are not "falsifiable".

It's been an important part of modern scientific advancement that falsification is a good, but not perfect way of learning about how the universe works.

For example, macro-evolution is basically unfalsifiable, and yet it is generally regarded as good science.

netizen-936824 · on Feb 2, 2022

I don't believe I'm familiar with an Ingram experiment, any chance you could expand on this? I was also unable to find something with a quick internet search, however Milgram was suggested to me.

scoofy · on Feb 2, 2022

Milgram experiment, sorry... must have got hit by autocorrect there: https://en.wikipedia.org/wiki/Milgram_experiment

You can add the Stanford Prison Experiment (Zimbardo) to the list of, now unfalsifiable, experiments that inexplicably are regularly taught in university settings when they cannot possibly provide any useful data to the sciences: https://en.wikipedia.org/wiki/Stanford_prison_experiment

The idea that many if not most universities create educational programs based on these very obvious problematic studies, leads me to believe that the scientific community can be as guilty of info-tainment bias as the general public. I only wish that I'd been able to continue in academia so that i could have at least a small voice in changing this problematic dynamic we live with.

They literally made a movie about the Zimbardo experiment in 2015. It couldn't be more obvious that people care more that it be true than whether it might be false: https://en.wikipedia.org/wiki/The_Stanford_Prison_Experiment...

zzzeek · on Feb 2, 2022

what do you mean when you say "now unfalsifiable", is that because the experiments themselves are too unethical to be repeated?

mannerheim · on Feb 3, 2022

The Zimbardo experiment, at least, has actually been falsified, so I don't think it's right to call it unfalsifiable.

netizen-936824 · on Feb 3, 2022

That's correct, however students are being told about the manipulation these days. Especially concerning the prison experiment.

throwawayffffas · on Feb 3, 2022

I think the author makes the distinction in the beginning.

> In experimental sciences, the experiment is the “real work” and the paper is just a description of it. But in math, the paper, itself, is the “real work”.

In other words, there is no replication crisis in math because there is no replication to be done. There is no experiment to be replicated, just work to be checked for correctness.

disgruntledphd2 · on Feb 3, 2022

> the idea that the now unfalsifiable Milgram experiment is still taught in every university psychology dept should be outrageous

I agree with this point in general, but the Milgram studies have been replicated many, many times so potentially not the best example.

The Stanford Prison Study, on the other hand...

schuyler2d · on Feb 2, 2022

You could also argue that Math had its replication crisis in the 17th-19th centuries. E.g. infinite series "proofs" that were eventually shown to be flawed methodologies.

This and other crises led to grounding modern mathematics with set theory, Zermelo–Fraenkel axioms, etc and understanding what's possible (e.g. Godel's theorem).

Psychology and other social sciences are barely a century old.

civilized · on Feb 3, 2022

Other examples:

- The "Italian school of algebraic geometry" of the 19th century used intuitive methods that, while groundbreaking, ultimately proved to be unreliable and generated many false results. https://en.m.wikipedia.org/wiki/Italian_school_of_algebraic_...

- Mochizuki's abc "proof", which seems to be believed a proof by many of his Japanese colleagues, but fatally flawed by most everyone else. https://www.math.columbia.edu/~woit/wordpress/?p=12220

Mathematicians have definitely gotten out ahead of their skis in the past, but I have the impression that the community today is incredibly good at finding flaws in flawed work and making solid work fully explicit and rigorous. It can take years or decades though.

zmgsabst · on Feb 2, 2022

Mathematics is having a replication crisis and people pay so little attention they don’t know.

That replication crisis has led to efforts in formal verification such as HoTT, Lean, etc.

https://homotopytypetheory.org/

https://xenaproject.wordpress.com/2021/06/05/half-a-year-of-...

dwohnitmok · on Feb 2, 2022

No by and large mathematics is not having a replication crisis. As the blog post in your link states:

> Question: Was the proof in [Analytic] found to be correct?

> Answer: Yes, up to some usual slight imprecisions.

This has been the case for almost all math formalization efforts. Even when (very rarely) proofs were revealed to be incorrect, the result was salvageable.

zozbot234 · on Feb 2, 2022

Note that fixing the "slight imprecisions" in a typical ad-hoc proof is still very valuable. Quite often it leads to a simpler, easier to understand, tighter, etc. argument as the proof can be freely "refactored" without changing its validity. This is the underlying reason why formalizing a proof that hadn't been done previously is considered actual, publishable work.

andrewjl · on Feb 3, 2022

Not rarely. This talk shows several examples: https://www.youtube.com/watch?v=E9RiR9AcXeE

One of the more important results discussed was not salvageable.

dwohnitmok · on Feb 3, 2022

> Not rarely. This talk shows several examples

I think you're underestimating the amount of theorems now that have been verified without issue. Those several examples are an absolute drop in the bucket.

> One of the more important results discussed was not salvageable.

I don't see any important result in that talk that was not salvageable. I see a lemma that had to be changed in support of a larger theorem, but again nothing that "broke downstream papers" so to speak. All the results ended up being fine for their purposes.

thrown_22 · on Feb 2, 2022

That's because most math proofs are treading on well understood grounds and only extending them slightly. E.g. it would be like a psychologist asking how an results of a well proven result would be different if all participants wore red shoes.

When you enter truly new grounds mathematicians don't even agree if the distinctions being made have a meaning, let alone if they are true.

dwohnitmok · on Feb 2, 2022

> When you enter truly new grounds mathematicians don't even agree if the distinctions being made have a meaning, let alone if they are true.

What examples are you thinking of? I can think of only one case where this has been the case (Mochizuki and the ABC Conjecture), but that turned out to have so much fanfare precisely because mathematicians did not agree on what distinctions were being made and this was the only time in living memory that this had occurred (the general consensus is that Mochizuki's proof is simply too obfuscated to make heads or tails of). As such the ABC Conjecture is not considered solved.

However, that is almost always not the case, even on the cutting edge of mathematics and even when making other breakthrough discoveries (e.g. Fermat's Last Theorem).

> E.g. it would be like a psychologist asking how an results of a well proven result would be different if all participants wore red shoes.

And to be clear the major reason why fields like psychology are termed to have a "replication crisis" is because well-known results are being overturned, not just cutting-edge ones.

EDIT: I see now that OP also refers to the ABC Conjecture.

thrown_22 · on Feb 3, 2022

>What examples are you thinking of?

Back in grad school I was very much into types which let me do things sideways to how most mathematicians did them.

The worst one was the derivative of a derivative - NOT the second order derivative. Using R for the real numbers, the type signature of first (and any) order derivative is (R -> R) -> (R -> R). The type signature for what I was talking about was ((R -> R) -> (R -> R)) -> ((R -> R) -> (R -> R)). It didn't have very many interesting properties I could find but trying to explain it to anyone else in the department was like pulling teeth. They'd start thinking about second order derivatives every time and there is no common notation for talking about third order functions like there is for first order and (some) second order functions (derivative, integral, transforms, etc).

Mathematics only works as well as it does because mathematicians are all working on pretty much the same thing, not because there is some innate quality in mathematics that pushes mathematicians towards truth.

Jensson · on Feb 3, 2022

> The worst one was the derivative of a derivative

What you are talking about already exists, in many different versions at that. I think you really underestimate how varied objects mathematicians works with, they work with so many things that it is hard to come up with original ideas, this idea was already explored close to 200 years ago.

https://en.wikipedia.org/wiki/Fractional_calculus#Fractional...

thrown_22 · on Feb 3, 2022

The type of that derivative is still (R->R)->(R->R).

I also have a PhD in maths, thanks.

Jensson · on Feb 3, 2022

It isn't, it takes the derivative of the derivative operator, not just applying the derivative to a function several times. Fractional calculus is the study of viewing derivatives as a continuous quantity rather than discrete, so there taking derivative of the derivative operator makes sense.

You could very well mean something different than what that article is talking about, but it isn't like I am just saying that a second order derivative is the same thing as what you are talking about. But if you mean something different than "the derivative of the derivative operator" then you aren't very good at explaining what you mean, you have to be more precise as math is a wide field and if you describe an object imprecisely then people will misunderstand.

> I also have a PhD in maths, thanks.

You yourself argued that people with PhD in math gets this wrong, so I don't see why you would bring this up. But the most likely scenario is that you just failed to communicate your thoughts properly since you used imprecise language. Maybe mathematicians should get taught more how to be precise with their statements, but usually they can rely on the crutch of old notation. I did invent new notation and solved some old unsolved problems that way in grad school, and the other mathematicians had no problems understanding what I wrote, so mathematicians have no problems understanding new things from my experience.

auggierose · on Feb 3, 2022

It IS unusual to worry about type signatures as a mathematician. Sounds to me like your PhD topic was closer to computer science than math? Or at least constructive mathematics ;-)

thrown_22 · on Feb 3, 2022

Mathematical physics oddly enough.

I got tired of having units mismatch in physics equations so I picked up a ton of type theory and used it for everyday work. Using it, rather than talking about it, has given me a very different perspective on it than anyone else I've talked to.

ukj · on Feb 3, 2022

If you keep walking down that path you'll soon be labelled a crank by the old guard.

Developing a practical rather than a theoretical understanding of type theory rapidly makes you express intuitions which other people can't lex/parse.

Because you begin to think in functors/compositions e.g constructively; and as you've already pointed out many of those functors don't have corresponding English nomenclature in a classical setting.

Find some Category Theorists to talk to instead.

Jensson · on Feb 3, 2022

> Find some Category Theorists to talk to instead.

Most mathematicians are also category theorists today. They were the ones who invented category theory in the first place and today it is a basic topic most takes in grad school and then used just about everywhere.

auggierose · on Feb 3, 2022

Very interesting!

grungegun · on Feb 3, 2022

>It didn't have very many interesting properties I could find

Usually notation is defined for higher order functionals when the need arises - when a new result is found to have interesting properties. Topological proofs on function spaces can use third order functions.

> Mathematics only works as well as it does because mathematicians are all working on pretty much the same thing

I don't think any person alive can understand all the major branches of mathematics well. By that measure mathematics is very broad.

drdeca · on Feb 3, 2022

So like, a functional derivative of differentiation?

Except like, instead of the 2nd order function returning a number, where the functional derivative would be ((R -> R) -> R) -> ((R -> R) -> R), uh, instead, the thing you got.

Ok, but, differentiation is linear, so, (\frac{d}{dx}(u(x) + \varepsilon \eta(x)) - \frac{d}{dx}(u(x)) )/\varepsilon is just \frac{d}{dx}(\eta(x)) (and so there's not even much need to take the limit as \varepsilon goes to 0, as it doesn't depend on \varepsilon anyway) ..

Uh, that seems a little odd, that the derivative of differentiation in the direction of a function would be the derivative of that function? Like, the derivative of something linear should be a constant, ah, but, the directional derivative of 5 x_1 + 3 x_2 also depends on the direction in which it is taken, though it doesn't depend at all on where it is taken.

Ok, yeah, same situation here then.

Is that the sort of thing you are talking about?

Though, you said "of a derivative" not "of the derivative", so I guess maybe you mean like, more generally things that satisfy Liebniz's law?

ukj · on Feb 3, 2022

This pretty much nails it.

If everyone commits to using approximately the same definitions then (at the social scale) you get normative semantics as an unintentional by-product. Everybody in your tribe uses the same terminology/notation and means the same thing by "==".

It's a desirable property because it minimises miscommunication and it enable effective asynchronous communication. Good luck trying to read a paper by somebody using different semantics.

And then computer scientists come along and point out that all definitions (and by proxy - all Mathematics) are arbitrary.

Because the chosen axioms are arbitrary.

guerrilla · on Feb 2, 2022

Here's [1] the lecture where Vladimir Voevodsky talked about the problem and his experience with it but like the blog says, he didn't and they don't consider it a crisis. Even HoTT (and other TT) people present it as how things could be much better, not about how things are terrible.

1. https://youtu.be/E9RiR9AcXeE

zmgsabst · on Feb 3, 2022

I worked in applied math, in industry — as the math engine guy supporting economists.

The trials and tribulations of abstract math where these are little errors that turn out to (mostly) be salvaged translates into billion dollar mistakes in mathematical models in industry — errors that have caused a hiring freeze at a major tech company.

I respect other people feel differently — I personally think there is a crisis in mathematics, where the mistakes/errors of Voevodsky et al is the tip of the iceberg, but the most visible since it happens in academia versus industry.

I hope everyone can agree that:

a) it’s currently hard to verify mathematical models and proofs; and,

b) we could make that better — and currently are working on it.

pas · on Feb 3, 2022

But that's not a crisis of math, it's a crisis of business trying to use math that is not made for it. It's the Post Office Scandal all over again. Overreliance on shitty models, lack of quality assurance, etc. (Or if we look at it like the O-ring debacle of the Challenger accident, or the usual problem of one-off creations, like the Hubble, JWST, or any other public project that is late and over budget. And the F35 is also in this category, because it's also the problem of forcing a model on people that has so many free parameters that it's an endless argument about nothing.)

https://en.wikipedia.org/wiki/British_Post_Office_scandal

zmgsabst · on Feb 3, 2022

You’re trying to correct me about something I experienced first hand in my career — and you’re coming across as ignorant.

What you lament as “business trying to use math that is not made for it” is precisely the problem:

Mathematics can’t be reliably applied, even if you hire a dozen world class PhDs, hundreds of software engineers, and let them spend years trying to make it work.

That’s a crisis.

guerrilla · on Feb 3, 2022

I think for there to even be a crisis there would have to be consensus on that.

Victerius · on Feb 2, 2022

How is math experiencing a replication crisis?

> This site serves to collect and disseminate research, resources, and tools for the investigation of homotopy type theory, and hosts a blog for those involved in its study.

> Exactly half a year ago I wrote the Liquid Tensor Experiment blog post, challenging the formalization of a difficult foundational theorem from my Analytic Geometry lecture notes on joint work with Dustin Clausen.

???

fshbbdssbbgdd · on Feb 2, 2022

A more recent example is the Italian school of algebraic geometry, where it was discovered in the mid 20th century that many claimed proofs were faulty (and some “proven” results were incorrect).

metacritic12 · on Feb 2, 2022

But then again molecular biology is younger than psychology and still doesn't have its order of magnitude of replication problems.

kens · on Feb 3, 2022

Molecular biology has a big problem with people faking results. Even when the fakery is pointed out, journals are very, very slow to do anything about this.

Elizabeth Bik has put a lot of work into finding faked images in published papers. Her Twitter feed is pretty entertaining; she presents images and challenges reader to find the duplications. https://twitter.com/MicrobiomDigest

For more, see Nature: https://www.nature.com/articles/d41586-020-01363-z

svat · on Feb 2, 2022

Despite the title (a title with a question in it invites people to comment without reading the post, even more than the usual already high level), this is a really good post IMO, with valuable insights into not just mathematics but also the replication crisis elsewhere. (And it does discuss Mochizuki's claimed proof of the abc conjecture, and links to MathOverflow on formalization with proof assistants, to a recent paper discussing Vladimir Voevodsky's views, etc.) This from the first part is sound:

> The replication crisis is partly the discovery that many major social science results do not replicate. But it’s also the discovery that we hadn’t been trying to replicate them, and we really should have been. In the social sciences we fooled ourselves into thinking our foundation was stronger than it was, by never testing it. But in math we couldn’t avoid testing it.

But the post doesn't stop there! The second part of the post (effect sizes etc), with the examples of power posing and the impossibly hungry judges, is even more illuminating. Thanks!

raphlinus · on Feb 2, 2022

This is a thoughtful and thought-provoking blog post. I think it's worth asking similar questions of computer science. I think you'll find some math-like patterns -- there's basically no chance Quicksort or any other fundamental algorithm paper will fail to replicate -- and some patterns which will fail to replicate, like in software engineering.

Some of the early results on pseudorandom generators and hash functions aren't holding up well, but I think that's just progress. We understand the problem a whole lot better than we did back then.

Perhaps more interesting is the literature on memory models. The original publications of the Java and C11 memory models had lots of flaws, which took many years to fix (and that process might not be totally done). I worry that there are a bunch of published results that are similarly flawed but just haven't gotten as much scrutiny.

skybrian · on Feb 2, 2022

There was that time when it was discovered that nearly all published binary searches and mergesorts were wrong. [1]

And yet, the concepts of binary search and merge sort are fine.

I think that's quite similar to the situation in math papers? Because math isn't executable, a math paper being "significantly" wrong would be like discovering that a program uses a fatally flawed algorithm and is trying to do the impossible. It can't be fixed.

Programs that can't be fixed seem rare?

[1] https://ai.googleblog.com/2006/06/extra-extra-read-all-about...

thethirdone · on Feb 2, 2022

Rather than "wrong", I would describe those implementations as "not fully general". They work perfectly when `n < some large number` as opposed to `n < some large number * 2`. The latter is the best you can do with the function signature, but that is somewhat arbitrary. You could easily choose a 64 bit index and exceed all practical needs.

yyyk · on Feb 3, 2022

Many of these examples aren't really in C but in C-like pseudo-code. In that domain they are perfectly accurate. Even in C, 'int' bitlength is defined by the implementation. int only needs to be 16 bits or more, but it could easily be enough bits to exceed number of known atoms in the universe in which case overflow is impractical.

I'd say that's an example of the difference between science (the authors don't need to show every detail of a practical implementation and can assume infinite bits in int) and engineering where you do need to make such consideration.

AussieWog93 · on Feb 2, 2022

From my experience in ML, I'd suspect that the "crisis" isn't that the research is false so much as it's useless (algorithm x with parameter set w happens to work well on one particular dataset, conclusion: I have revolutionised the field).

yodsanklai · on Feb 2, 2022

This isn't unique to ML. A lot of research is about adding an epsilon to an existing paper, which probably doesn't interesting anyone except a small community working in their very own niche topic.

But does it mean there's a crisis? maybe that's just a way to foster an environment that will let great ideas emerge.

fshbbdssbbgdd · on Feb 2, 2022

I’d rather be in the world where we have too many papers tweaking the details of power posing and exactly measuring how much each contributes to the effect. At least we’d know the effect is real.

gspr · on Feb 3, 2022

This course of events doesn't play out in math, though.

Analemma_ · on Feb 2, 2022

The parts of CS that are the most math-like (which include fundamental algorithms) don't have a replication crisis, but the ones that are the most social-science like probably do, or would. I would bet large sums of money that a lot of the literature on stuff like "does OOP lead to better code", "does IDE syntax highlighting lead to fewer bugs" etc. would fail to replicate if anyone bothered trying.

The thing is, the general sense I get is that people in CS already have so little confidence in these results that it's not even considered worth the time to try and refute them. Which doesn't exactly speak well of the field!

sterlind · on Feb 2, 2022

I worry about ML papers in particular. models are closely guarded, often impractical to train independently due to ownership of the training/test set, or computing power or details left out of the paper. there's no way to mathematically prove any of it works, either. it's like social science done on organisms we've designed.

zozbot234 · on Feb 2, 2022

> there's no way to mathematically prove any of it works, either

Or there is, but then you're doing statistics not just ML.

Beldin · on Feb 2, 2022

Some measurements are interesting and valuable without being replicable. For example, the number of online devices or the number of websites using wordpress. Take the same measurement at a later point in time and the results are different. Yet I wouldn't call those fields maths-like.

Karrot_Kream · on Feb 2, 2022

Research into this stuff is very young and so I think it's fair to be skeptical of the results. I'm hoping we'll eventually come up with more rigorous, reproducible results.

elisbce · on Feb 3, 2022

Anything math provable is objective. Anything depending on human opinions like "what's better code" is subjective, and thus suffers from replication.

ahelwer · on Feb 2, 2022

In competitive programming you could basically assume the pseudocode in a paper is not literally correct and requires some tweaking to work, despite a “proof” of its correctness. Particularly with string algorithms.

sterlind · on Feb 2, 2022

long time no see!

there's a couple levels there:

rote translating pseudocode into your target language isn't likely to pan out well.

so instead you run the pseudocode in your mind, develop an intuition on how it works, and that's the "replication" bit this post talks about with reviewing math papers.

but both the pseudocode and your code will likely have edge cases you didn't handle. this isn't a problem for math - that's the category of common trivial/easily fixable proof errors that don't really affect the paper. but they're a problem for machines that run them literally.

maybe a good compromise strategy for formal verification is to declare the insight of the algorithm - recurrence relation or whatever - as an axiom, and then use the prover to whack the tricky edge cases.

yodsanklai · on Feb 2, 2022

Yes, I'm convinced tons of published results are flawed! I heard top researchers tell their students "don't spend too much time on the proofs, nobody reads them"). And much CS scientific papers don't get a lot of attention. But it's not necessarily bad, other researchers builds on top of this work and results consolidate over time.

xenonite · on Feb 2, 2022

Isn’t this a misunderstanding? I suspect they rather meant to avoid spending too much time on language in these parts.

yodsanklai · on Feb 2, 2022

No, it's not. In that specific case, the supervisor thought the value of the paper didn't lie in the proofs, plus it was a rank B conference. He rather has his student working on a different paper than spending 1 week on the proof.

zitterbewegung · on Feb 2, 2022

Math doesn’t have a replication crisis at all it has a comprehension crisis.

Since proofs are programs one can basically say that mathematical theorems are incredibly detailed software that is completely open source and invites people to identify programs that don’t work and or fix issues.

A famous one is Fermats last theorem which needed a fix but was largely right.

Others have said that it takes 6 months to a year to get published. The other thing with math is the fact that you can get completely scooped and your work is worthless.

Edit: I am using "proofs are programs" very loosely and yes Theorems are much more than programs as other commenters have pointed out.

contravariant · on Feb 2, 2022

A more notorious example of the comprehension crisis would be Mochizuki's claimed proof of the abc conjecture. So far fairly few people are willing to claim they both understand and agree with the several hundred pages of 'proof'.

zitterbewegung · on Feb 2, 2022

I was tempted to use that but Fermat's last theorem is known to the general public for much longer and has a resolution.

contravariant · on Feb 3, 2022

It was a good example, and definitely one of the more important examples of how finding and fixing mistakes in mathematics is supposed to happen. I figured the abc-conjecture would provide a nice contrasting example.

exdsq · on Feb 2, 2022

https://en.wikipedia.org/wiki/Shinichi_Mochizuki

It's a fun rabbit hole to go down :)

nitwit005 · on Feb 2, 2022

The math department at my university tried to read through the proof of Fermat's last theorem, as a for-fun activity. They eventually gave up because they realized it would take too much time.

jaydaigle · on Feb 2, 2022

Interesting! My experience is that scooping is less of an issue in math than in any of the science fields I have friends in. Papers are lower-stakes, there's less money involved, and if two of you are working on the same project you can just co-author.

(And if you have an independent paper, that can _also_ get published; your paper is distinct even if the result isn't. I think the PT HOMFLY polynomial was independently proven in like four different papers published within two years (and it's named so that all eight authors get credit).

But also, publication lags shouldn't lead to more scooping, because you can put it up on the arXiv at the beginning of the publication process, not the end. In my experience the paper is treated as "real" once it hits the arXiv; the acceptance is mostly a formality that lets us put it on our promotion packet.

But also, publication times don't lead to scooping generally because you

mb7733 · on Feb 2, 2022

> Since proofs are programs one can basically say that mathematical theorems are incredibly detailed software that is completely open source and invites people to identify programs that don’t work and or fix issues.

I don't think it's that straightforward; proofs in papers are a mix between explanation in natural language and mechanical steps. Not every step of deduction can feasibly be written out. That's part of why computer aided proofs are not that popular in math.

hgibbs · on Feb 2, 2022

Having a paper published can take a very very long time, a year is quite short and 6 months is basically the minimum wait. My last paper has taken 2 years to be accepted from the first time I submitted it, despite it being largely the same as the initial submission, accepted as part of my thesis over a year ago, and having several citations. It is very frustrating, and it also means that easier (and less original) work is easier to publish.

Quekid5 · on Feb 2, 2022

An interesting perspective on programs-as-proof is I-forget-his-name-but... the mathematician who made really bold claims, if only you'd study under his tutelage for a number of years to understand this whole new terminology he invented.

With programs-as-proof it really wouldn't matter. It's either "computer says yes" or "compu'er says noooo".

EDIT: Whoop, sibling post mentioned, it's Mochizuki.

morelisp · on Feb 2, 2022

This doesn't seem a very generous description of Mochizuki's work. You don't "need" to study under him for a number of years and there's no evidence he's being obscurantist. The proof is long and has a lot of novel techniques he's invented, and he works primarily in Japanese. You can reasonably side with e.g. Scholze's interpretation without thinking Mochizuki is disingenuous or some kind of scammer.

He's considerably less esoteric than e.g. Grothendieck was even during his more "public" years.

Quekid5 · on Feb 2, 2022

I have nothing invested in whether or not any given mathematician is right or wrong. I just picked a random example of a controversial proof -- the point was more that proof-as-computation could settle and any all disputes.

It might not lend more understanding to people not invested in "field X" (or even people who are invested in field X!), but it would be proof.

Proof in the current world of math is quite intangible.

morelisp · on Feb 3, 2022

I think the demand for "tangible" proof (if by that you mean, fully mechanically-verifiable proofs and a style unlike anything common in mathematics papers today) is a bit silly and seems to be driven by some ideologies well outside of mathematics, rather than mathematicians themselves.

A proof is whatever convinces sufficient mathematicians that the theorem is consequent! Classical logical systems are a very good way to do that so they get used a lot. But they're not the only way, and involving a computer program makes most proofs less convincing rather than more.

freemint · on Feb 2, 2022

> It's either "computer says yes" or "compu'er says noooo".

Sadly not sometimes it is just:

computer says

morelisp · on Feb 3, 2022

Or "computer says x but you don't trust the hardware/software".

Or "computer says x but you don't agree the mathematical concept was correctly formalized into the proof engine language".

pthread_t · on Feb 2, 2022

> The other thing with math is the fact that you can get completely scooped and your work is worthless

Why math specifically? One would think this applies in virtually all fields.

zitterbewegung · on Feb 2, 2022

Increase in development time (publication can take 6 months to a year).

bawolff · on Feb 2, 2022

> Since proofs are programs one can basically say that mathematical theorems are incredibly detailed software that is completely open source and invites people to identify programs that don’t work and or fix issues.

I would say its more like pseudocode. There can be quite a large gap between a normal proof, and a machine checkable proof, which is the computer program version.

k__ · on Feb 2, 2022

Aren't programs generally more complex than math proofs?

Like, the more accidental edge cases people produce, the less they understand the program.

freemint · on Feb 2, 2022

Not any proof with quantifiers. As soon as statement contains ∀ (for all) ∃ (there exists one) where you have a way bigger branching factor and the claim after the quantifiers needs to be true all for all (or atleast for one) element of the set you draw from.

k__ · on Feb 3, 2022

Yes, what I meant was, math proofs are well defined, and programs are often a heap of stuff thrown together.

While it's harder to produce the math proof, it's probably harder to grasp what's going on in a program in a mathematical sense.

lacker · on Feb 2, 2022

The root of the replication crisis in social sciences is not just that many papers fail to replicate, but that there is no way to clearly resolve a result that fails to replicate. A paper claims that pre-K increases test scores a decade later, another paper claims it doesn't, and there's no clear resolution. The disagreement just festers, with both sides citing research that supports their opinion. The argument often "spills out" into the public sphere.

In mathematics and computer science, there are many errors in published papers. However, once you point out an error, it's usually pretty straightforward to resolve whether it's really an error or not. Often there is a small error which can be fixed in a small way. Exceptions like the abc conjecture are rare.

gus_massa · on Feb 2, 2022

> When I’ve served as a peer reviewer I’ve read the papers closely and checked all the steps of the proofs, and that means that I have replicated the results.

The side effect is that math papers have an insane long time to publication. Perhaps 6 months, or 1 year or more if you are unlucky.

In physics, the publication time is like 3 month. Something like 1 month for the first review and then two months for making small changes suggested by the referee and discussing with the editor.

As a side^2 effect, some citation index of the journals count only the citations during the first year. But the papers that have the citation are sleeping over the reviewer desk during that year, so the number is lower than the real number.

umvi · on Feb 2, 2022

My completely subjective opinion is that at the highest levels of math, there are only a handful of people that are even capable of peer reviewing, and their time is in high demand.

Wiles's proof of Fermat's Last Theorem is like 120 pages long and he first delivered it disguised as a class to a bunch of grad students who barely understood any of it and hence gave no feedback. Because this is Fermat's Last Theorem which is famous, eventually people in the math community that understood Wiles's work reviewed it and found an error. Had it been a 120 page proof of some not famous problem like random chessboard thought experiments, it probably could go years without anyone seriously looking at it.

topaz0 · on Feb 3, 2022

It seems that a lot of commenters here have not read the post, and are simply posting their own opinions about how to answer the question in the title. If that is you, and you are interested in the question, I recommend reading it in full -- there are a lot of interesting points there.

maliker · on Feb 3, 2022

Maybe there isn’t a math replication crisis, but there is a kind of math fragmentation crisis. Sub-fields of mathematics have gotten so deep that specialists can’t communicate with one another [0]. I’ve heard this expressed by other mathematicians as having only 2 or 3 other people in the world that they can discuss their work with.

The level of solitary study necessary to make progress is another sign of this phenomenon. Andrew Wiles famously spent six years alone in his attic to come up with his proofs for Fermat’s Last Theorem.

[0] https://sites.math.rutgers.edu/~zeilberg/Opinion104.html

nemo44x · on Feb 3, 2022

In Soviet Russia, mathematicians, physicists, etc came to the same conclusions as their enemies in the USA.

However, the social science results, although unified in each country, were almost always different and in conflict.

Social sciences are politics and nothing more. It’s an opinion of how the world ought to be.

mjfl · on Feb 2, 2022

because math doesn't do experiments.

Jtsummers · on Feb 2, 2022

Or as noted in the article:

> But one of the distinctive things about math is that our papers aren’t just records of experiments we did elsewhere. In experimental sciences, the experiment is the “real work” and the paper is just a description of it. But in math, the paper, itself, is the “real work”.

And

> And that means that you can replicate a math paper by reading it.

horsawlarway · on Feb 2, 2022

I think that means that the word "experiment" isn't the right term for what most mathematicians do.

I'd say most times it's "modeling", not "experimenting"

NovemberWhiskey · on Feb 2, 2022

This seems the obvious answer. The replication crisis isn't about published material being wrong, it's about the inability to reproduce the results of experiments or studies in a repeatable fashion.

It's not like you make a hypothesis in math and then need to go away and interview a sample of 1,000 circles and report back that, controlling for ellipses that may be misreporting as circles, the ratio of the circumference to the diameter is 3.2 +/- 0.1 (p<0.05).

medstrom · on Feb 2, 2022

Exactly. A math study is not a "study" in the sense of "hey i saw a funny pattern in some data maybe it's a sign of my pet theory" - it's literally already proven when published. There's nothing more to do.

morelisp · on Feb 3, 2022

There are a lot of math papers / work that are "hey I saw a funny equation/morphism/shape/etc, what does it 'do'?" that present a lot of conjectures / constructions but are weak on any conventionally called a theorem. https://erich-friedman.github.io/packing/ is probably one of the best known on HN.

nimih · on Feb 2, 2022

https://www.experimentalmath.info/ would beg to differ, I think.

bunje · on Feb 2, 2022

"Mathematics is the part of physics where experiments are cheap." - V.I. Arnold

pixl97 · on Feb 3, 2022

Until you start multiplying infinities on your auto scaling AWS cluster

horsawlarway · on Feb 2, 2022

This is my take.

Science attempts to describe reality. Math attempts to create rules/axioms.

They're not the same pursuit, although they can often be useful together.

Koshkin · on Feb 3, 2022

It does (especially using computers), it's just that experiment is not the sole criterion of truth as it is in, say, physics.

mjw1007 · on Feb 2, 2022

I think it's a good article.

But it's worth considering the possibility that mathematics could have fallen, and could still fall, into a state where false results are frequently published, and isn't 'protected' by anything special in the nature of the field or its practitioners.

Just as you might find yourself asking "why did city A fall into the grip of organised crime, and not city B?". You might look for answers in the methods of police recruitment or a strong history of respect for the rule of law or anything like that, but it might turn out that the answer is really just "city A got unlucky".

the_watcher · on Feb 2, 2022

The social science and medical replication crisis seems like it would be far more impactful than a mathematics crisis, right? Politicians, policy-makers, doctors, etc. all make decisions based on potentially flawed or outright incorrect studies in a way that I don't think is true for the equivalents in math, simply because there aren't decisions and policies up for debate related to much of them (if I am wrong about this, please correct me).

smartscience · on Feb 2, 2022

If a flawed mathematical paper were used as the basis for what then became a flawed cryptography algorithm, I can see that having impact if the bad guys noticed the flaw first. But yes, I expect examples like that would be comparatively rare.

not2b · on Feb 2, 2022

In cryptography the math is almost always the strongest part, and it is the side-channel attacks and implementation mistakes that let the bad guys in. When it is the math, the flaw is often that the algorithm has all the desirable properties proved in a number of papers, but has some exploitable structure that analysts can turn into an attack.

the_watcher · on Feb 3, 2022

Thanks for this, cryptography is a good example where this could be a problem.

IshKebab · on Feb 2, 2022

I think politicians generally don't make policy decisions based on science. I live in the UK and an obvious example is drug policy. They even fired the scientist whose drug research they didn't like.

But if the science agrees with a decision they've already made then they're happy to use if for justification, even if the science is junk (e.g. the crazy fines on taking children out of school in the UK).

dragonwriter · on Feb 3, 2022

Because math isn't science, so there is nothing to replicate. If the analysis is sound, that's it, there's no question of whether the physical results the analysis was applied to are fraudulent, measurement error, due to improper setup, etc.

Why is this even a question? Analytical and empirical fields face fundamentally dissimilar challenges.

mathteddybear · on Feb 3, 2022

One time when I was still a sophomore a friendly professor shared a story how one of his colleagues had a math paper accepted, and found an error in the proof only after it was published. So he had another paper in that journal next, with the refutation of his earlier work. And that feat was worth double the point score awarded to the institute by the scientific powers that be.

I'm sure the excuse is that errors in math papers these days don't have the same kind of impact as mistakes in, let's say, medicine.

Later on my guess was that this will change when we will have a editor software that is easy to use, wysiwyg, renders math texts as well as latex, and actually the edited document is integrated with the formal proof, beneath the fancy elegant phrases and formulas.

Then I quit a job and enrolled for phd studies but that's another story.

amelius · on Feb 2, 2022

Because math is not science?

https://philosophy.stackexchange.com/questions/14818/is-math...

_fnhr · on Feb 3, 2022

I wanted to post the same thing, so was looking for a top comment addressing it and glad to have found it. It's weird how many people, even on HN, do not get this distinction. Math gets thrown in with all the sciences but it itself is not a science - it doesn't have experiments, it doesn't follow the scientific method.

Mathematics cannot have a replication crisis because there is nothing to replicate. In math correctness is tested not by redoing something but by double-checking the original work for mistakes.

ocschwar · on Feb 2, 2022

There is one corner of math that does have a replication crisis. Just as we compare programming languages by how "ergonomic" they are to learn and use, mathematicians do come up with novel notation systems to try to improve the ergonomic state of their field, and since "ergonomics" is another way to say "esthetics", and is proved or disproved by user testing, that is where replication gets hard.

The inventor of category theory's wiring diagrams, for example, has claimed that he could get middle schoolers to understand them. I suspect that success has not been replicated.