This is essentially the "calculus vs. statistics" debate that has been going on for some time now. I'm definitely on the side of making statistics mandatory, and then making pre-calc/calc electives for motivated students who want to enter STEM fields. Primary reason for this is that I believe a base knowledge of statistics is so important for everyone in modern society, and it directly goes to better being able to evaluate the constant flood of information we are now all subjected to.
I agree in principle that statistics education should be given to all students. The problem I have with the proposal is that I have very little confidence that teachers are prepared to provide a rigorous and thorough statistical education.
For an example, the replication crisis in the social sciences. These studies, containing flawed statistics, are carried out by social scientists with PhDs who have been required to take courses in statistics designed for their discipline.
If they can’t get the statistics right, how can we expect it from teachers with far less education?
> For an example, the replication crisis in the social sciences. These studies, containing flawed statistics, are carried out by social scientists with PhDs who have been required to take courses in statistics designed for their discipline.
> If they can’t get the statistics right, how can we expect it from teachers with far less education?
This isn’t really a statistics education issue, much like Enron wasn’t an accounting education issue.
Social scientists abuse statistics because the incentives are all aligned with abusing statistics. To get a good job you have to publish lots of papers. To publish you have to have statistically significant results. To get into top journals you need surprising results. You see other people in your field playing fast and loose with their data and getting rewarded for it. Why not exclude that one problematic subject from your data analysis? Without them the p-value (from a regression on a carefully chosen eight of your eleven measured variables) drops to .038, and you can publish...
It’s not that nobody thought to teach the scientists about corrections for multiple comparisons or the dangers of picking observations to exclude as outliers based on what gives you the result you want. They learned all those things. But it’s so much harder to get results when you’re not willing to play games with the data, and they need results. The people who do try to play by the rules wash out, either by choice after getting fed up with the fraud they see all around them or by failing to publish N papers in top journals.
(Social) scientists don't necessarily know statistics even though they have had some courses. It's very common e.g. not to understand what p-value actually means. It's disturbingly common that non-linearities or heteroscedasticity are totally ignored in regression. Assumptions of statistical tests are typically not understood.
It's largely a cargo cult. For example p-values are reported as t(N) = x, p < threshold but very few understand why and keep on doing it and even demanding it from others.
(The why is because the p-values had to be read from tables before computers. It makes no sense nowadays.)
this is not the only reason 'why'. Another persistent reason is the function of the alpha value in typical null hypothesis testing leading to the (misguided) idea that a p value of .04 is functionally equivalent to a p value of 2.3e-10 since both are below threshold.
So i would argue that this is more damning - essentially a misunderstanding of probability and what p values tell us at a fundamental level.
Typically you see p < 0.01 etc even with 0.05 alpha. A lot of stats software gives only those inexact values.
But yes, the interpretation of p-values and confidence levels are wildly misunderstood. p > alpha is often taken as "evidence of absence" of an effect, which is just wrong. Or when for some quantity p1 < alpha and other p2 > alpha, it's often intepreted that the quantities differ.
Most study outcomes have a political cause connected to them, even if that wasn't the author's intent. Those movements want these study outcomes to give their cause legitimacy and the membership of such movements often have a large amount of overlap with those employed in social science academia. It's little different than petrol companies that either fund or amplify studies with outcomes that benefit their business.
While there are definitely still papers being published as a result of p-hacking (intentionally or not), psychology has been undergoing a renaissance in this area for a few years now. See the "Reproducibility Project: Psychology": https://osf.io/ezcuj/wiki/home/
In short, there is a movement to validate past results, starting with those most influential in the field. A lot of progress has been made in that area.
Because all social sciences are not "roughly comparable to Enron." Hacker News is getting a bit ridiculous with the utter disdain shown for sciences and academia. The problem identified by the parent of this post is a problem of incentives that are misaligned with what we as a larger society want out of basic research. But every energy company had the same incentives as Enron. Not every energy company published fraudulent financial statements.
Poor replicability of social sciences is multifactorial. To some extent, what they're studying is a moving target that is not all that amenable to scientific methods, which tend to assume static reductionist laws dictating system behavior. The dynamics of how fundamental forces dictate everything from gravity to covalent molecular bonding don't change from culture to culture as well as over time as trends. When you're studying human behaviors and preferences, I'm sure some of it is governed by more or less immutable eternal laws, but some of it is semi-random diffusion of learned trends and what is true today of one group of people may not be true of any other group or even the same group at some later point in time.
Some of it is statistical illiteracy and not understanding the limitations of the techniques you're applying.
Some of it is outright fraud.
There is also interplay between those two because outright fraud is facilitated by peer reviewers not having the statistical maturity to be able to detect it.
> Hacker News is getting a bit ridiculous with the utter disdain shown for sciences and academia.
Just to be clear, nobody actually said the social sciences are roughly equivalent to Enron. I used Enron as a hyperbolic example to emphasize that there are reasons for poor behavior other than lack of education. I was not implying that academic fraud is at the same level as Enron fraud, and after rereading what I wrote I’m comfortable with how I worded it.
> But every energy company had the same incentives as Enron. Not every energy company published fraudulent financial statements.
Accountants and corporate executives have substantial disincentives against publishing fraudulent financial statements, like going to jail. Academics mostly do not have similar disincentives against abusing statistics. There have been a few high profile embarrassments, but for the most part even people who are widely known to have p-hacked their way to dozens of questionable publications are still sitting comfortably in their tenured professorships.
Because if we did we’d have to admit just how little real information we know in so many fields. We want to think that (as a culture) we’ve got it all just about figured out.
My anecdotal experience in a psychology faculty makes me think there is an issue of statistics mastery in the social sciences. There's also an issue of corruption like p-hacking, but so far the statistics courses I've had were poor and the teachers didn't really understand what they were teaching.
Yes, but this is not necessarily (only) because of the teacher quality. Statistical testing is quite complicated and very easy to do wrong (violate assumptions). On average scientists aren't very technically or mathematically apt, so in the few short courses things have to be taught in a "cookbook style".
Short term individual and small group behavior in artificial conditions can be somewhat reliably measured. How well and to what situations these genralize to is a harder question.
Larger groups and timescales are (in practice) outside the domain of science (in the most rigorous form) and must be studied with other methods and epistemological criteria. This also leads to major abuse of statistics as the inference has to be done with unrealistic assumptions and unwieldly models.
The "brand" of science is so strong that many fields, especially economics, want to appropriate it even though they don't and can't do science in the strict definition.
> If they can’t get the statistics right, how can we expect it from teachers with far less education?
Because it’s their job—and if the statistics curriculum is mandatory, then the teachers might spend inservice days developing statistics curriculum, going to workshops to learn how to teach statistics, etc. Teachers will develop statistics curriculums and share them with each other. With NSF grants, you can fund teacher outreach programs, put statistics exhibits in science museums, etc.
A lot of teachers will struggle to teach various subjects. That’s why we have support networks in place, to help develop curriculums and provide training for teachers. I know that the support network has a lot of problems—but many teachers would struggle to teach, say, biology, history, or algebra, too, without support.
I think what's far more likely is that the curriculum-writers will see it as an opportunity to dumb down the math. They'll make it into a very basic course on study design and data collection, with a handful of descriptive statistics thrown in so the students have something to do with their scientific calculator's STAT mode.
Will they teach discrete and continuous probability distributions? The binomial, Poisson, and normal distributions? Dependent and independent random variables? Bayes' Theorem? Measures of statistical significance and hypothesis testing? Chi-squared and student-t distributions? Confidence intervals and p-values? Maximum likelihood estimation?
Highly doubtful, since many of those topics are built on top of university-level calculus.
People who have some understanding of study design and data collection would be in a much better spot to understand and interpret day-to-day news / “information flood” than those who have done a lot of calculus-based probability. You can go all the way through rigorous measure-theoretical probability and come away with almost nothing useful for interpreting a study.
Most problems I see with moderns statistics aren’t of the form “ohhh, they fooled you by using a subtly wrong statistical metric to ascribe significance” but “the way the data was gathered/interpreted is fundamentally wrong and made to mislead”
Some of the problems come from statistics itself. “All models are wrong, but some are damn hard to interpret.” Let’s make that a corollary to Box’s statement about the utility of models. Why does so much statistical analysis take place without even an expectation of human comprehension? It’s just magical sigils for most papers.
Many midlevel statistical practitioners suffer from a holier than thou complex, where a “correct” approach to statistical analysis might buy a little more precision at the expense of a lot of comprehension.
Box plots or Bar charts with error bars, using randomized data collection. That’s like 90% of the interpretive value right there. Statistics is a UI for math and it could use improvement if we expect so much from it.
See Brett Victor’s “Kill Math” for more context on why we should expect more from our mathematical interfaces.
http://worrydream.com/KillMath/
> "Math" consists of assigning meaning to a set of symbols, blindly shuffling around these symbols according to arcane rules, and then interpreting a meaning from the shuffled result. The process is not unlike casting lots.
@dr_dshiv Edit: Sorry about the bad quote, I should have included the prefix "When most people speak of Math ...". And the negative comment about the blog post.
Now I've read / skimmed all of it and it was interesting, I hope the new methods he wants to use for teaching maths will work fine. (I'm sceptical, but still seems like worth to give it a try.)
I think his project title does his project a disservice: "Kill maths"? That sounds silly to me.
And how it starts -- I got annoyed and stopped reading (until I went back two days later).
Another more positive project title maybe could be "Maths for everyone" or "A new approach for teaching maths"?
In my experience they (try to) teach those somewhat. Not in a calculus-type rigor, or even very mathematically, but conceptually yes.
The problem is that those things are not "immediately" needed, so students don't learn them or immediately forget them if they do. What students "immediately need" is to run some test in some application and check if some value passes some magical threshold.
These students then become researchers and these researchers become professors.
>I think what's far more likely is that the curriculum-writers will see it as an opportunity to dumb down the math.
I'd go as far as to argue that has already happened. The reason math doesn't really teach problem solving and instead opts for working through things with formulas etc is precisely because that's a dumbed down version of math.
Not all topics are equally easy to teach and assess in a deep way, with limited time and resources.
Statistics is IMO a lot like security. Unless it's at a very high level, just follow a basic check list and don't do anything creative. Calculus is more like algorithms - you can get to deep and creative levels at an earlier stage.
My first real stats course was a 300-level engineering stats class that kicked my ass pretty well.
The class had a calc 3 prereq I found the computation generally the easiest part. Truly grasping the topic takes patience and work but it's pretty rewarding. That said, it must be difficult to find genuinely insightful instructors who can make the material remotely interesting because good god it's necessary.
I think Americans do calc 1 which is the basic derivatives and integration you typically learn in high school, then calc 2 which is more like a college calculus course, then calc 3 is multi variable calculus.
I don't recall if multivariable was pushed in at the end of my differential and integral calculus course or at the beginning of my differential equations course. It's possible it was also somehow tucked into my linear algebra course (though, I doubt it).
In any case, we did cover multivariable as a pretty straightforward extension of single-variable calculus, without making it a separate course. Do I likely have some huge blindspot as a result of not spending a full course on multivariable calc?
(All of my formal education was in the U.S., for what that's worth. Though, it was an accelerated magnet program teaching middle school students algebra and trigonometry and covering geometry, calculus, linear algebra, and differential equations in high school.)
Did you cover Jacobian matrices and how to to calculate and classify local extremes in a multivariate function? Do you remember saddle points? If not you did miss multivariate.
The main thing you lose then is you don't know how to apply calculus on non linear coordinates like spherical coordinates and so on. It is useful for data analysis if your data is easier to work with after a non linear transformation, but if you don't work with that sort of thing then probably not very useful.
Then you did multivariate calculus in the linear algebra course. It isn't that strange to do it that way since the hard parts of multivariate has more to do with linear algebra than calculus.
Thinking back to my own statistics at university… how do you even teach statistics without a lot of other math first?
Analysing data is similar. I'm all for it, it's clearly useful on the days when my main working tools are slack and powerpoint, but it's not clear to me how teach that without math. One of my math textbooks even had the word "analysis" in its title. ("Calculus and analytic geometry" perhaps?)
You can absolutely teach basic statistical literacy without too much other maths and certainly without any higher maths. That’s how it’s done in schools in other countries - for example here is the statistics component of the maths GCSE for one of the exam boards in the UK (the others will be similar) https://www.aqa.org.uk/subjects/mathematics/gcse/mathematics...
GCSE starts after the third year of secondary school in the UK so most people will have completed their GCSE in the year they turn 16. Then 16-18 year olds wanting to study STEM subjects go on to do maths A-level, which includes more topics that in the US would be considered precalc as well as some calculus, more probability and statistics and some mechanics. Here’s the A-level syllabus from the same board. As you can see, for stats it includes distributions and hypothesis testing, so much more depth https://www.aqa.org.uk/subjects/mathematics/as-and-a-level/m...
A level maths is not a necessity for all STEM courses. My daughter is doing AS maths (half way between GCSE and A level, heavily weighted to stats), alongside A levels in Biology and Chemistry (and one non science A level)
I'd argue that very little - or even no - math is required to teach the concepts.
The ideas of standard deviation and confidence intervals can be taught visually, for instance. You needn't be able to calculate them to understand what they mean when they're presented.
The extent of my statistics education in high school was "mean, median, mode". That was it, and I exhausted every math course that was offered at my school by the time I entered 11th grade.
You can teach basic statistics without a lot of math background.
If your class of 12 year old physics students has just timed a block of wood sliding down a sloped plank at different angles, and plotted an X/Y chart of their results, you can just have them put a best fit line through the points by eye.
No need for matrices or differentiation or X-transpose-X-inverse-X-transpose-y - just bang a line through the data by eye.
My wife is an academic and is repeatedly raising the issue of the replication crisis in medical fields too. This is an issue across all of academia and isn't just an issue with flawed statistics but pressures to publish, a lack of replication in the peer review process, a lack of replication by subsequent work building on that original research, using more tools like Machine Learning, etc...
Metascience is a growing field at many top universities because of this issue and the belief that modern science may be very flawed right now.
So we ram language constructs into people’s brains, which we started doing centuries ago to preserve knowledge.
Yet it comes with none of the warnings by the long dead mathematicians who initially built out statistical tooling[1]. Statistical tools are intentionally crude to help communicate the complexity of stats and yet we build society on crude leaky abstraction. Not out the obvious day to day right in front of our faces.
Modern science isn’t flawed. Society is. Boomers and GenX are proper gold star for nothings who lucked into living in the only country that could manufacture anything after WW2. They dismantled the New Deal that propped them up and made us their serfs.
They didn’t fight the war or build the economy. They have no muscle memory for doing anything “real”. They went to college (much less rigorous in the 50-70s), memorized the cliff notes and recited the catechisms. Neither generation struggled materially as any generation before them (yes yes distributions, ranges, gradients of truth; relative to material conditions before 1950s).
New generation comes along with no awareness of that time and doesn’t feel much obligation to status quo. GenX and Boomers are big mad about it, never mind they walked away from religious life. This time the future will stick to our script.
Society wants less to do with their hyper industrialized life due to war time manufacturing habits they had rammed down their throats by a much more imperialist society of decades gone.
We keep letting people who cheered on conquest of other countries to stay ahead of them keep running things. Everyone is too apathetic to tell grandpa it’s time for hospice. We let elders implicitly engage in ageism against youth.
Everyone is acting shocked the old warlords are looking for idiot soldiers to serve their fiefdoms? All they knew for their formative years was war time …hustle. Embedded deep.
Stop with the meta bullshit. Day to day life is just this. The abstract mental models are not helping. They’re distractions apathetic people escape into to avoid reality. Boomers down to apathetic centrists who love their material privilege despite the environmental toll… this culture is a joke
Yes, it is. A lot of the issues found in modern science with replicable studies comes down to the publish-or-peril approach. If you have academics on temporary postdocs having to publish X papers to get an extension, find a new postdoc, or maybe get a professorship then you're going to have issues. Add to this the lack of incentives to replicate papers under this stress and people build up on top of them rather than validating them first. Especially when the studies are expensive/time-consuming with MRI machines etc.
> Stop with the meta bullshit
The impact bad research has both financially and in society is huge. For example the issue recently with Alzheimer's where loads of work was built up on a 2006 seminal study that wasn't replicable (because of academic fraud). Finding incentives to catch bad science early is important.
I have no idea what the rest of your message is about.
We humans are soooo good at fixing systemic issues. Surely we’ll eradicate this decades old issue at great cost, plus the same old costs, and new costs; it’ll be just as tidy and a big win as your tidy little post puts it. It’s so simple it needs but a paragraph to explain, afterall. Really, no greed or other perverted incentives will creep up as the process moves on and government loses interest again.
The rest of my post was to suggest where the flawed incentives come from; educationally outdated meat suits and apathetic voting public who think they’re off the hook to society. Elder politicians enable such perverse incentives because they care about fiat currency flow, not science. Ignore it and dig into vacuous meta theory because the public is kowtowed by threats by the seniles
It’s amazing to me how many think choices today are guaranteed to matter tomorrow. You have no idea if what you say is possible given the state changes that occur constantly reshaping global society
In the end you’re peddling high minded BS
We can’t get the world to agree on climate change. Surely we’ll keep all hackneyed science from propagating, certainly we’ll keep the costs from ballooning to serve any perverse incentives that pop up as the public lacks any real command of the political system and it’s pork spending
Sure, sure. Musk will have a full colony on Mars first
To be honest, I'm not that concerned with high schoolers needing "a rigorous and thorough statistical education."
I'm more concerned with the populace at-large being able to understand and apply fundamental statistical concepts. For example, another commenter mentioned Bayes Theorem, and how it's a very powerful idea and not that difficult to grok. Related, predictive value positive and negative, and how they are calculated from (but also very different from) sensitivity and specificity, are extremely valuable concepts to understand.
To what you point out, I think the concept of "p-hacking" is super important to understand, but I'd be less concerned about a student needing to hand-calculate the steps to run a t-test (that's what a college-level class is for). That said, I decided to look up T-test on Wikipedia while writing this comment, and I found this interesting tidbit. Great example of the applicability of statistics, and something that I think would peak the interest of many high-schoolers:
> Gosset had been hired owing to Claude Guinness's policy of recruiting the best graduates from Oxford and Cambridge to apply biochemistry and statistics to Guinness's industrial processes.[13] Gosset devised the t-test as an economical way to monitor the quality of stout.
> If they can’t get the statistics right, how can we expect it from teachers with far less education?
I’m unconvinced that the incentives pull them towards “better use proper statistical methods” or penalize them when they don’t.
This is, IMO, not insufficient education; if it’s not, the premise that math teachers given incentives to get it right couldn’t accomplish it because they have less education than a PhD sociologist is flawed.
Teachers not being skilled enough for a high school subject is not a problem in 2023.
By the time the curriculum gets reformed, it'll be like 2025. By which time students can ask GPT5 for any statistics question.
Statistics is so unbelievably broadly useful at low levels, compared to calculus. Understanding
1. Selection bias
2. Normal distribution + standard deviation
3. Central limit theorom
Is a massive help in modern society. You don't even need any math equations, just understanding them on a rough conceptual level would help.
If we decide that statistics are more necessary in earlier education there is nothing stopping us from building a system with more teacher training for stats education.
I dreaded statistics in high school. When I started out with probability I was excited and thought this stuff was useful.
But when I got the about two-tailed tests and p-values, it was all just so opaque. I didn't feel it was that useful, plus the whole argument about rejecting or not rejecting the null hypothesis felt too philosophical and contrived. And why 95% confidence?
I hated statistics throughout undergrad because of those philosophical contortions, which seemed very arbitrary to me.
It wasn't until grad school when I discovered the statistical learning side of things (PCA/PLS) that statistics became exciting to me -- because suddenly statistics became useful and able to predict things.
I'm still convinced the 80% of people don't need to know p-values and null hypotheses. Some might say, but what! These are used all over the social sciences! Umm no.
Let the people who need to learn that learn, and actually teach statistical learning (linear regression, logistic regression, Bayesian statistics) to the rest of us in school.
Same, with the extra comment that every other math course I took had some way of verifying your result. You do your work and the result can be plugged into the original question to calculate some other element and see if it matches, or the value can be checked visually, or the equation will cancel out completely, or ... Not with the statistics as taught at the time. Your result is 0.48 - did you use the right test? was it supposed to be two tailed? did you calculate properly? did you choose the right distribution? does that method work with this distribution? Who knows. It's basically a memory exercise.
Couldn't agree more. I waited until the junior year of my undergrad to take stats. That was much, much too late.
Everyone should know Bayes' theorem. It's not that hard to grasp, but it's a wildly powerful idea. Even if you aren't plugging-and-chugging, just use it to dissect logical fallacies.
That is basic probability, not statistics. Basic probability is taught before high school in many parts of the world and is very easy to teach, there is no reason to put a lot of focus on this in high school.
You have the kids write down probability graphs from events, that takes a few lessons. Then you do some basics about confidence intervals and sample sizes. Basically every kid learns this at some point, just that they forget and now they think it is missing. Kids forget about what they learned about statistics since it doesn't seem relevant to them.
If we allowed schools to also operate as casinos, students would remember probability better. There would be a whole host of other side effects, sure, but maybe a well-educated populace is worth it.
There's an episode of The Wire, in which Presbo finds kids in his class betting on dice; so he collects the dice from all the board-games in the school store, and starts running classes on probability. The kids are totally engaged, and they start winning money from street-corner dice games.
I vaguely recall doing probability in middle school, but it was never framed in formal terms. The lessons never really transcended "let's roll some dice and draw some graphs." The college course helped me project those ideas into other domains. You can use probability theory to dissect rhetoric, for example. Never thought about that.
You learn about conditional probabilities and how to calculate around those. They teach Bayes theorem but doesn't use that name or the formula, since the formula is too hard for kids to read, but you are given the intuitive explanation for how conditional events are related to each other and you write those out in event trees where it is easy to see. I clearly remember getting taught the exact equivalent to Bayes theorem in middle school, they just didn't call it that.
If anyone who learned that sees Bayes theorem later it is intuitively obvious, so there is really no need to bring it up, and there is no reason to take a course to understand Bayes theorem later as an adult. Those things are easy thanks to what you learned in middle school.
Yeah Bayes' Theorem and confidence intervals, population vs sample means, and margins of error are all things any college graduate should understand. When you realize you can bake correlations/associations to imply whatever you want, you start taking any percentages people toss around a lot less seriously.
I agree with you that statistics is incredibly valuable skill, but teaching it is not that different from teaching calculus. Besides notions of set theory, you need a pretty sound understanding of the concept of integration. For that you need to still understand real numbers, functions, graphing, and possibly even limits [1].
If you are teacher, you learn pretty quickly that every time you invoke magic or "trust me it works this way" you lose students. Students are clever and have self-respect and they don't like being lied to, or when information is hidden from them. You have to present a coherent picture, and I think teaching statistics without calculus is incredibly difficult to keep coherent.
[1] I am not sure if it's possible to argue without limits that the normal distribution extends to infinity, yet has a finite area under the curve.
If you wanted anything other than hand waving, you can't even define a normal distribution (because you can't define `exp`) without limits. `exp` is a transcendental function.
You don't have to lie to students to define the exponential function itself. They are familiar with pi, so it is not difficult to introduce another irrational number, and define f(x) = e^{-x^2}. There is no lying here.
The need for limits and such arises when you try to differentiate the exp function. But we don't need differentiation for basic statistics. Just integration.
How do you introduce e? Just say it's 2.718...? The usual definitions of e involve a limit, and to my knowledge there's no simple geometric definition like there is for pi. Likewise I don't know of a definition of exp that doesn't involve a limit, and there's no simple geometric one that I know of like there is for sin/cos.
(There is a geometric definition of exp that I know of, but it's that it turns a vector into an integral curve, so not so useful without calculus or limits)
You also need limits to be able to talk about the central limit theorem/when a normal even ought to be used. Otherwise you get confused people thinking everything is normally distributed by default.
Remember that my larger point is that statistics can't be taught without teaching most of calculus, so just teach calculus first anyway. But, if we hypothetically tried to get away with the minimum...
> The usual definitions of e involve a limit.... I don't know of a definition of exp that doesn't involve a limit ...
Yes, I agree with you that if you want to define e as interesting itself, you need to use limits. Similarly, the way exp(x) was introduced to me in high school was as a function whose derivative was equal to itself (i.e. as an interesting function) - which also requires limits but I think my teacher/curriculum just handwaved away that part.
But in our hypothetical curriculum, I am indeed proposing that we just say that e = 2.718..., since we are not interested in e, but in its usage for defining continuous probability distributions. Then to compute something like e^2 you just plug it into the calculator (like you do sin/cos) and it will give the answer. But again, we will have to put in effort to argue that something like e^(5/4) or e^pi is a computable real number.
> You also need limits to be able to talk about the central limit theorem
Indeed, but I think rigorous usage of the central limit theorem is quite beyond high school mathematics.
I was agreeing with/augmenting your larger point: I don't see how you can do any justice to the subject at all without calculus, just like I don't see the point in teaching a bunch of solutions to memorize to particular setups and calling it physics.
Even Bayes' theorem is, IMO, most obvious in the continuous setting where you can interpret it in terms of relative areas, which gives a nice, easy picture. Making big tables and trees obscures the basic geometry.
One of the thing that strikes me with the BLM and DEI movement is that the vast majority of people seems to be incapable of thinking in term of distributions, and rather think in term of stereotypes, which leads to what is in my opinion, the very definition of racism and sexism, ie attributing to an individual some attributes derived rightly or wrongly from some metrics (average or percentile) of a distribution purely on the basis of skin colour or gender. Instead of treating an individual as an individual which could be anywhere on those pretty wide distributions.
And I observe that even from friends that are reasonably well educated but have done little to no statistics.
This lack of basic stats skills does lead to bad public policies.
Sorry for my potential lack of understanding to what you are saying. But people don't think in terms in stereotypes because they don't look into distribution. You can still look at a distribution of people and form a stereotype because for most people and most cases you cannot form good enough sample to represent any significant statistics to draw any conclusions in everyday life interactions.
And in the case of BLM. I am not deeply involved with following what the development of the movement but I think at it core (at least when it was formed) is because statistically speaking, you are more likely to be killed by police id you are black. If you look at US prison distribution per capita and race, you can draw a similar conclusion. Stereotypes are a cause not an effect. You get some of that in part because some people are racist and have racial stereotypes.
And it is great that you are saying everyone should be treated as a unit itself. But we all know what this is not the case when you have a significant portion of the people think of people from their lens of stereotypes., Black peoole, muslims, Asians, Jews..etc and this have real consequences on the life of this people.
Good luck telling an TSA officer that you should treat me as individual and don't pick me "randomly" because I have have the wrong color to not be suspicious.
For the BLM, the stats actually support that you are slightly less likely to be killed by the police if you are black. But there are vastly more interactions of black people with the police, and those are aligned with crimes committed, including crimes which stats are unlikely to be affected by policing practice (eg murders). Now we can debate why the black population commits more crime, but it’s not a racist policing discussion.
But it’s even more mundane things. People are focusing on extreme percentiles of distributions, like % of engineers at google, or board members of major companies, ie looking at 0.001%-ish percentiles, where even minor differences in distributions may have a dramatic effect. And this leads to people reacting to this with “so you are saying that women engineers at google are less capable” (read many times on HN during the google memo controversy) which is absolutely not what those differences in distribution mean. They just mean you may see more of one group and less of another, but if the recruitment process is fair, all the people who passed the threshold are equally capable. And then you have to compare that to the distribution of people who actually apply, which has its own biases.
> Now we can debate why the black population commits more crime, but it’s not a racist policing discussion.
Criminal statistics are a record of who is arrested and prosecuted, not a record of who commits the most crime. If a black criminal is more likely to be arrested than a white criminal, then the statistics will reflect that.
Yes, it's more reliable to consider number of crimes reported and number of victims. I think you'll see why there are more police in black communities if you consider those statistics.
What confuses me is that this is now painted as racist and somehow Republican. I clearly remember that back in the 90s it was the Democrats, at the urging of black leaders, who supported harsher sentences and funding for more police (most prominently the Clinton crime bill [1] and three strikes laws in Washington and California [2]).
And we can see that black people still want the police to focus on their communities, as they elect leaders like Eric Adams (over the objections of white progressives), and a Gallup poll in 2020 found that 81% of black people want police to spend same amount of or more time in their area.[3]
That naturally leads to more interactions with police and more incidents when terrible things happen. Policing is always a trade off between the harms that police cause (through mistakes, misconduct, and misunderstandings) and the harms they prevent.
---
1: "the largest crime bill in the history of the United States... provided for 100,000 new police officers, $9.7 billion in funding for prisons". "Then-Senator Joe Biden of Delaware drafted the Senate version of the legislation".
2: "The first true "three-strikes" law was passed in 1993, when Washington voters approved Initiative 593. California passed its own in 1994, when their voters passed Proposition 184[16] by an overwhelming majority, with 72% in favor and 28% against."
Which is why I tend to look at homicides as it will be less biased. Every homicide will be accounted for (materially). And if anything you would expect less efforts to be made by the police to identify the author in poorer black areas, which would lead to an under representation. But from memory I think close to 50% of the homicide in the US have black authors (and predominantly black victims too which is consistent with your point). Which is broadly in line with their representation in police shootings.
> For the BLM, the stats actually support that you are slightly less likely to be killed by the police if you are black
I think you meant more likely not "less". And even in thay case, this [1] is statistics up to before covid that shows that it is actually 3 times ratio between black and white risk of being killed by a police. This does not constitute "slightly".
> But there are vastly more interactions of black people with the police, and those are aligned with crimes committed, including crimes which stats are unlikely to be affected by policing practice (eg murders).
That would be true of any race or group of people. They have much more interactions with police other than those which will result in death.
> People are focusing on extreme percentiles of distributions, like % of engineers at google, or board members of major companies, ie looking at 0.001%-ish percentiles, where even minor differences in distributions may have a dramatic effect
Yes, and that's actually how personal stereotypes works. Your experience with people is very limited to form an opinion about race, nationality...etc. It is not even extreme percentile like top or bottom but it is a small percentage that people use to build their opinions.
> but if the recruitment process is fair, all the people who passed the threshold are equally capable
If the process involve human judgment or evaluation, then biases start to be a huge factor that prevents establishing a clear threshold. Unless you have something like a standard exam that will be taken without human interactions. You will always be working under the assumption that the process is not 100% fair. That is actually hard problem to solve especially at big companies. I don't know a solution that can give the best outcome and I don't envy people who have to do it.
But this is a good example of the problem with statistics and the public. Fryer’s analysis was highly touted, but not a single person I talked to could explain it to me or say why his results differed from other analysis.
I think the link you provided argues implicitly that looking at shootings as percentage of police interactions is wrong because policing practices may be biased (ie black population may be more policed than other segments of the population).
What I am saying is that you find a similar black over representation in crimes that are less likely to be overpoliced (homicides which stats should be fairly reliable, ie every incident accounted irrespective of the race of the author, and which if anything would be less policed in poor neighbourhoods where gang violence is more common and less effort is made to identify the author).
I don't think anyone is arguing black overrepresentation in crime. Rather is arrests to shooting ratio a good measure of racial bias. As Fryer himself notes, shootings are very different than almost anything else police do -- its a life changing event. In his own research he notes that excessive force does show racial bias, but it's not viewed as a life changing event for the cop.
In fairness to Fryer (and yourself), we may not have the tools to determine this definitively either way today. And really my more important point is that as a society we should approach with caution statistical claims where we don't have a good understanding of the methods and the pros/cons of the methods. After becoming familiar with Fryer's methods in this study I'm probably leaning toward his conclusions being wrong -- but I wouldn't wager large sums of money on my leaning.
They're both. Some are a cause, others like for example "black people can't swim" are an effect of racist laws. Changing how you treat a group creates its own set of stereotypes. Then you can also get a multilayered stereotype pile from things like cake walk.
And that's an axiomatic error, not statistical. They hold it as an axiom that they should think in terms of percentage instead of individual outcomes. Statistics can't fix it, because you can't question axioms with mathematics alone.
AP Statistics has existed since 1996 and my high school (and many others) offered it as a path for students in addition to the calculus path. That is, you could head down the statistics path or the calculus path (or both) depending on your interests.
I finished the AP Calc path my junior year, but was told colleges would still expect to see a 4th year, so I signed up for AP Stat.
It was mind-numbing how easy it was by comparison. We got to the end of the semester and I realized we'd only just reached some of the principles from the first week of calculus. I don't remember why he was so proud to point out that we could now calculate derivatives (it was a long time ago), but I was thoroughly unimpressed.
By the second semester, I had my college acceptance letter, so I didn't bother with the other half.
The computation in calculus was the hard part... the intuition made complete sense almost all of the time. In stats it seemed the other way around. Interesting that you seem to feel otherwise, but different strokes.
I feel like it depends on the teacher at the end of the day.
At my relatively average Bay Area public high school, we offered AP Stats, AP Calc AB, AP Calc BC, and AP Physics E&M. This was the kind of HS where 10% of the student base finished BC by 10th grade and began taking Multi and Linear Algebra from our local community college or Cal.
Our AP Stats teacher integrated a lot of Calc and E&M concepts into the curriculum and stressed Probability theory quite heavily, the same way our Calc and Physics teachers dug deeper into Numerics/Interpolation than the AP Board demanded.
That said, the AP Stats test was an absolute joke.
I would like to see the basics linear algebra taught in high-school. Lack of knowledge of linear algebra is extremely insidious. If someone who doesn't know calculus encounters a calculus problem, they recognize it as something they can't solve. If someone unfamiliar with linear algebra comes across a linear algebra problem, they see it as an extremely difficult algebra or trig problem. I've personally seen programmers apply trigonometry to problems that only require addition and subtraction to solve.
I've never understood the "calculus vs statistics" argument. Calculus is an unquestionable prerequisite for any meaningful statistics work.
If you don't understand the concept of "integrating a function" they how can you possibly make sense of virtually any part of statistics? 90% of practical statistics can be boiled down to understanding the basic algebra of normally distributed random variables and then doing basic calculus on the result.
Statistics without calculus is the worse kind of statistics where students are taught to blindly throw tests at a problem without having a clue as to why they are doing this. Statistics without understanding is worse that no statistics at all.
I don't think it's necessarily 1 class of statistics vs 1 class of calculus, but the fact that your typical calculus ladder is 4 semesters worth, most of which being about symbolic solutions. Handling integration and derivation of simple polynomials is fine, but the majority of the computational part of calc 2 is just as detached from modern calculus as asking people to multiply five digit numbers in your head.
The useful parts of calculus are typically the easiest for students, and could be taught in a semester. Instead, your typical calculus track is 2 whole years.
#1. split into three. Next two are other classes. Some universities combine diff eq and linear algebra in one semester. Some universities probably have classes that cover the three semesters in two. I think most prepared students arrive with AP calc BC and just start on the last semester (multivariable..)
>> Martin said it’s also important to remember that vocational training is not the only purpose of math education.
This is true, and not just for high school but for college. Neither are out to teach specific vocational skills - rather they provide the underlying tools that students need to learn their vocation later on.
Sure we could stream kids from grade 1 for their "allocated profession", teaching them only with regard to their eventual job, but there's a reason we don't go that pretty dystopian route.
With that in mind we can argue that one curriculum is better than another for a specific path or other. But really that's a fruitless, and pointless argument. Future STEM students need Calculus, future programmers need Logic, future artists need Color Theory.
If I had to argue for a curriculum change I'd lobby for things like budgeting and basic accounting - but that's just me and everyone will have a different opinion.
I agree - as long as it's not an excuse to dumb things down statistics should be taught before Calculus. It's a quirk of history, I think, that we have it the way it is. Statistics is still going to be important to understand for 99% of STEM majors, so I don't think there's much downside to flipping the order.
The probability theory, statistics and data analysis you need for typical everyday life is already taught in middle school. A statistics course is overkill, the problem here isn't that people weren't taught those things it is that they don't remember it afterwards.
You can read the standards here, everyone learns this before high school:
No, I would argue that what is taught in middle school is the "basic of the basic", e.g. stuff like average vs. median, etc.
I'm more talking about the high school version of statistics, and the link you gave gives good info: https://www.thecorestandards.org/Math/Content/HSS/introducti.... That curriculum is more what I'm referring to, but as far as I know is not often taught to HS students in the US.
It isn't just median vs mean basics, see for example:
"For example, estimate the mean word length in a book by randomly sampling words from the book; predict the winner of a school election based on randomly sampled survey data. Gauge how far off the estimate or prediction might be."
This is proper statistical understanding, since it includes how sampling is made and confidence intervals. They just don't use the names of theorems, they just teach the understanding for estimating values etc. I don't see the value in drilling math equations for calculating those values and forcing the kids to remember lots of names.
Making (pre)calc elective would have a lot of good outcomes, but it would also drag the decision point for whether someone wants to pursue a STEM career down to earlier in their life - you'd have to commit to going on the "STEM track" or the "non-STEM track" before starting High School.
For people who know they don't want to go on the STEM track by then already, the change is a net positive. However, it might also mean STEM loses a few more people from underrepresented groups as a result, who would have done well in STEM but are now locked out of that track earlier.
Pre-calc is a weird course. It includes functions, exponentials and logs which are fairly key concepts in STEM, along with a bunch of fiddly trigonometry that just becomes superfluous once you introduce complex numbers. You could teach the first part of precalc and then do whatever part of trig+complex numbers alongside calc.
this is no longer the debate. the debate is whether kids should have to take math classes at all in order to graduate. And based on what i'm seeing, the answer the schools came up with is - No.
I don't know if I'd want Linear Algebra proofs - they tend to veer into results over vector fields rather than the more useful "solve Ax=b" and "here is how AI works" parts (i.e we should teach real-world computational linear algebra rather than the pure parts of field).
That said, I think proofs are an important thing we should teach in high school - its more a flavor of "real math", and logical thinking involved is a better transferrable skill than the symbol manipulation and bag-of-tricks-you-never-really-use-in-the-real-world you get from high school calculus courses. I'd just focus it on something easier to grasp, like basic number system construction.
The foundations of statistics are very much based on calculus, though. You can't really have one without the other. Even study/experiment design is at its root an optimization problem.
The foundations of statistics are very much based upon Galton observing:
"the middlemost estimate expresses the vox populi, every other estimate being condemned as too low or too high by a majority of the voters"
at a livestock weight guessing contest and conceiving of a measure to quantify normal variation: the standard deviation without the use of calculus.
To be sure this evolved to expressions that included an integral sign or two .. but the foundations were founded with no more than a Sigma and some division.
There's a stronger case for those than care to make it that statistics is more dependant on linear algebra, if one takes the view that statistics is about finding fewer lower dimensional fair representations of many higher dimensional values.
Calculus really should be thought of as dependent on linear algebra (the derivative of a function is the best linear function that approximates it; it just happens that in 1D the only linear functions are multiplication by a single number). Linear algebra seems like it would be far more useful than algebra 2 (things like conic sections?), trigonometry, and "pre-calc".
It's possible that you could do something like algebra (middle school)->geometry (maybe introduce the notion of a group here and focus more on symmetry and not so much on figuring out the missing angle/length in a complicated diagram)->linear algebra->probability/statistics. Concurrent with linear algebra, have kids learn calculus in physics class. After physics 1, they can do chemistry and/or e&m. Do vector calculus in e&m. Basically trim all the useless stuff out of high school and add the first couple semesters of college instead. Offer analysis as an elective after linear algebra/physics 1, and put a proper account of the n-D derivative and things like the Newton–Raphson method there.
Obviously that's a STEM bound curriculum, but at least in my school growing up, you only needed algebra 1 and geometry to graduate, which the honors kids did in middle school. So I assume any curriculum more advanced than that is for STEM bound kids.
Reforming the overall approach would probably do more for numeracy than tweaking the end stage curriculum.
As it is now, you have ~3 groups of students in most K-12 math classes. A group that is bored because they have mastered the concepts being presented, a group that is benefitting from the concepts being presented and a group that doesn't have the framework to benefit from the concepts being presented.
There are of course lots of teachers that will be doing what they can to address the gaps, but it needs to be systematic.
I would much rather see a one semester course on formal logic and then a basic probability course than throwing students into the misery of spreadsheets and proprietary stats software.
Eh. I think calculus should be taught theoretically. It’s honestly a very simple concept and helpful to have intuition for. Just don’t waste time on the implementation. Folks aren’t going to remember the chain rule years later.
Chain rule is a bad example. If you cannot remember the chain rule, you do not understand the theory at all. And if you do understand what a derivative is, the chain rule is trivial.
Right, the chain rule is foundational. If you don't understand the chain rule, then related rates and lots of your real-world engineering applications of calculus would seem to require memorizing a crazy number of specific solution templates.
Though, if typical calculus courses don't spend the time to ensure a good understanding of the chain rule, that would explain why so many students seem to think calculus is a gigantic sea of random rules.