I would love to see these studies repeated, but asking whether the public believes different reliable, well-studied conclusions of science. For the most part, all these studies ask is whether people trust scientific results that contradict conservative ideology.
But there are a variety of reliable scientific results that contradict other ideologies. What would happen if they were included?
Examples:
- Variance in female intelligence/math ability/other traits is smaller than for men, at roughly the rate Larry Summers famously speculated about.
- Different races within the US have different intelligence levels, and that intelligence is highly predictive of adult outcomes.
- Demand curves for labor are downward sloping.
- In the nature vs nurture battle, nature won.
- Teacher performance can be reliably evaluated with statistics (e.g. VAM).
- There is no scientifically demonstrated benefit to organic food, and no harm from GMO.
Diane Ravitch, such an unbiased source. Lets assume the counterfactual - that teacher performance has no measurable effect. The logical conclusion is that since quality is irrelevant, we should focus on cost. I assume that you support this conclusion? Diane Ravitch certainly doesn't.
The bulk of your notes show simply that variance is high, that the effect of teachers fades over time, and that the effect of teachers is small. None of this implies VAM doesn't work.
Many of your sources don't even agree with you. For instance, the Mathematica evaluation suggests a 74% accuracy rate!
Some years back I recall we discussed this, and I suggested you go learn about the difference between variance and bias. Have you done this?
Incidentally, you do bring up yet another solid scientific conclusion that cuts against non-conservative ideologies: teachers don't matter much at all.
> The bulk of your notes show simply that variance is high
There is a lot in there about bias also, that's what all the stuff about standardized tests not being on an interval scale is about. Gifted vs mainstream vs special ed students show wildly different rates of progress on standardized tests that's completely divorced from how much they're actually learning, which means that the scores of teachers are largely determined by which populations they're teaching.
> None of this implies VAM doesn't work.
High variance absolutely implies that VAM doesn't work. If you are firing good teachers at random then only morons would go into teaching, especially given the barrier to entry.
> the Mathematica evaluation suggests a 74% accuracy rate!
In this context, 74% accurate means 100% useless for the above reason.
> another solid scientific conclusion that cuts against non-conservative ideologies: teachers don't matter much at all.
I would say that most of the research using standardized tests to 'prove' that teachers matter is wrong, but that's not the same as proving that teachers don't matter. Any research purporting to 'prove' that teachers don't matter is going to be flawed for the same reason, because you can't reliably measure teacher quality by using tests meant to evaluate students.
High variance absolutely implies that VAM doesn't work. If you are firing good teachers at random then only morons would go into teaching.
So only a moron would trade stocks, speculate on real estate, or take a job as a salesman?
Of course, we just agreed that even if this is true, it doesn't matter. Morons are nearly as good at teaching as anyone else, and the results of good teaching fades over time anyway. So what would be the harm?
It's a simple fact of statistics - the bigger the effect size, the easier it is to measure. It's simply innumeracy to claim large effects sizes and also impossibility to measure them. Why all the left wing mathematics denialism?
(Another example of left wing mathematics denialism applies to the pigeon hole principle. If you have N houses and K > N people, K - N people won't have a house.)
The risk vs. reward profile for these jobs is very different. A stock trader can be successful one year and make up for losses the next, so long as he is successful more often than not. If an excellent teacher has a 25% chance of being fired every evaluation session, he has nothing to compensate for this risk -- just low pay, unconscionable hours, and a very difficult, thankless job.
Most teachers are teachers because of belief in helping children and personal dedication. Flipping a coin and saying "you're fired if I get two heads in a row" every year means that is gone.
It is not innumeracy to suggest there are so many confounding variables that are nearly impossible to separate from the treatment that it isn't a realistic or effective method for making real decisions about performance that negatively impact the careers and lives of dedicated public servants. It is innumeracy to suggest that "the bigger the effect size, the easier it is to measure" -- it is not incorrect technically, but it is very misleading. No matter how large the effect size, it can be very difficult to separate the effect of different variables with limited data (which is always the case -- we are testing one teacher against another)
Granted, I haven't read those particular sources and I'm not sure if this is the approach they take.
Of course phrases like "left wing mathematics denialism" are purposefully incendiary, contentless, and laughably absurd -- they belong somewhere like Breitbart, not here. Most Mathematicians are left-wing, for the record. I've never heard of anyone denying the pigeon hole principle in that context (or any context), but straw men are a very effective rhetorical device.
A stock trader can be successful one year and make up for losses the next, so long as he is successful more often than not
If you don't get good returns your first couple of years, you'll likely be out permanently. A couple of bad years on a 20 year trading record, you might be fine.
Then again, Alex's sources all note that VAM stabilizes after a number of years (between 3 and 5). So long term teachers should be fine too.
Similarly for salespeople. Like it or not, getting a professional evaluation based on a noisy objective measurement is nothing special. It happens in many professions - why should teachers be protected?
Also, I didn't realize 38.5 hours/week, 9 months/year was "unconscionable".
No good teacher only works 38.5 hours a week. They will often be awake at 1AM several nights a week grading papers and tests, they are expected to stay past contract time to meet with students, and they have significant continuing education requirements. Many also need to take on extracurricular activities or summer teaching to get enough money to live within 30 minutes of work.
Teaching is absolutely a thankless job in the actual work. Students are constantly disrupting class, calling teachers names, threatening teachers and other students. Administrators blame teachers for their students' behavior. Parents routinely shout at, insult, and threaten teachers. Other teachers are often hostile. Any perceived glory in teaching is just whitewashing the real nature of the job (kind of like military service -- lots of talk about glory and honor, but immense disrespect in day to day experiences).
"Dedicated public servants" is almost a pejorative euphemism at this point.
Then a significant majority of teachers are not "good", by your definition.
Some simple arithmetic: given that teachers are excluded from that study if they work less than 35 hours, we discover at most (38.5-35)/(K-35) % of teachers work K hours/week. (This is based on the extreme case of x% of teachers working K hours, 1-x% working 35 hours.) So if 50 hours/week is "good", then at most 23% of teachers are good.
As for your subjective opinions about "thankless", what evidence - if any - would cause you to change your belief?
How did they define and report "work", and are they including breaks in those averages?
I know several teachers. The burden of proof would be on you to demonstrate that teaching is a higher status profession with endlessly respectful and eager students.
When I was doing my student teaching for my teacher certification I would spend 90-100 hours a week either in a classroom, planning, grading, or doing coursework for my student teaching. It was unlikely to be any different for my first 5 years as a teacher had I gone into the profession. I make about twice as much now as I would have if I'd gone into teaching, and my work weeks now are more like 50-60 hours.
Saying that contract hours are the actual hours a teacher works is at best disingenuous. It's absolute bullshit.
The source I cited is not measuring contract hours. It's based on time diaries of work performed "yesterday". Why don't you try reading it before criticizing it?
Most likely you are simply mistaken/lying about the amount of hours you worked. Don't worry - you are in good company. Most people routinely give high numbers for how much they work "usually", as compared to how much they worked last week.
> So only a moron would trade stocks, speculate on real estate, or take a job as a salesman?
All of those are examples of jobs with fairly to extremely reliable performance measurements... E.g. if a salesperson sells makes 20 sales in a quarter, then there is zero chance of them getting fired because their managers think they actually made -7 sales.
> Morons are nearly as good at teaching as anyone else, and the results of good teaching fades over time anyway.
Again, not accurate. The evidence that good teachers have some magical impact on student performance as measured by standardized test scores is flawed, but that doesn't have any relevance to the question of whether or not good/bad teachers exist in reality. And in fact, if you buy into the idea that "education isn't the filling of a vessel, but the lighting of a fire," then you'd expect the impact of good teachers to become larger over time. (But again, not as measured by VAM.)
All of those are examples of jobs with fairly to extremely reliable performance measurements... E.g. if a salesperson sells makes 20 sales in a quarter, then there is zero chance of them getting fired because their managers think they actually made -7 sales.
Whoah, suddenly you are applying a much lower level of skepticism to performance measurements now that we aren't talking about education.
As you know, the mean standardized test score of students in a class (with no attempt to adjust for student quality) is just as objectively known as the number of sales. So I guess if we just did that, we'd be fine?
Of course, the relevant question is how does that compare to a baseline # of sales that this salesperson could be expected to get?
I knew a salesgirl doing enterprise software. In a given month she might make 0 sales, or 1, in a very good month 2. Is she a bad salesgirl because she had a month with 0 sales? That's variance, which according to you 100% invalidates a performance metric.
Teachers aren't ranked based on mean test scores, they're ranked based on their estimated contribution to student test scores.
There are two issues here: 1) How accurate are the metrics 2) how stable are the metrics over time.
Here are the differences between sales and teaching:
- With VAM, not only do you have stability to worry about, but you also have the question of whether or not VAM is an accurate measure of teacher contribution to student performance in the first place. With sales you might argue that the number of sales or total contribution metrics don't accurately predict CLV or something, but the ambiguity there is trivial compared with VAM.
- With respect to stability, with teaching you only get test scores once a year, which means that it takes about five years until the VAM scores of teachers become relatively stable. Whereas with sales people, you get new metrics every quarter.
- Since what matters the most in business is having enough cashflow to stay in business, the short term view matters more than the longterm view. Whereas with students, what they achieve five and ten years out is much more important than how they perform at the end of each year.
- With sales, there is zero barrier to entry, and if you get fired then you can just get another job two weeks later. With teaching you need to spend 2+ years and tens of thousands of dollars to get into the profession, and if you get fired then the best case scenario is that it takes an entire year to get another job. But often you're basically just banned from the profession.
- With sales you're also making several times more money than you would as a teacher.
If teachers were estimated on mean test scores, we could rank them the same way we often rank salespeople. Would you be ok with this? If not, why not?
And more importantly, why don't your newfound rationalizations apply equally well to salespeople or traders?
(Incidentally, traders also are measured against a statistical model - risk adjustments + benchmark rate.)
VAM is an accurate measure of teacher contribution to student performance in the first place.
And with sales, you need to determine whether your attribution/commission model is an accurate measure of salesperson compensation to company profits. It's the exact same problem, it uses similar statistical methods, and it has variance.
* Whereas with sales people, you get new metrics every quarter.*
So test every quarter and evaluate on that basis. Done. Further, many salespeople have less information per month than teachers - for example, the enterprise salesgirl I mentioned earlier who has 0, 1 or 2 sales/month.
You seem to be desperately reaching for rationalizations that explain why teaching is fundamentally different than every other profession. Why is that?
> If teachers were estimated on mean test scores, we could rank them the same way we often rank salespeople.
Good sales people create value by bringing in money, and sales are measured in money. Good teachers create value by teaching well, and mean test scores don't measure teaching well. That might be slightly simplistic, but that's the big picture idea.
> And with sales, you need to determine whether your attribution/commission model is an accurate measure of salesperson compensation to company profits
I'm not arguing that sales people are always fairly compensated, only that there is less complexity in doing so and the epistemological issues are more straightforward.
> So test every quarter and evaluate on that basis.
In sales, measuring is free, whereas testing is more like growing carrots... If you keep pulling up your carrots to check how big they are, then at the end of the summer you're not going to have any carrots.
> explain why teaching is fundamentally different than every other profession
So near as I can tell, we can't detect the dragon by looking since he's invisible. We can't use a heat meter to detect him with since the fire lives in his (perfect insulator) stomach. We can't put dust on the floor to look for fingerprints since he floats.
Would the world be any different if the dragon didn't exist?
Similarly, if "teaching well" didn't exist, how would the world be different? What testable predictions (if any) does your set of ideas make?
In sales, measuring is free, whereas testing is more like growing carrots... If you keep pulling up your carrots to check how big they are, then at the end of the summer you're not going to have any carrots.
Giving quarterly exams will somehow destroy all learning?
This goes against pretty much all the principles of spaced repetition. I take a Hindi test every day and it sure seems to help me.
I mean if you're willing to accept that the best teachers are the ones that improve standardized tests the most then the research on VAM shows that good and bad teachers exist, it's just not able to reliably differentiate between them in a reasonable amount of time.
A more straightforward 'proof' would just be looking at all students who take a class on something random like mycology and then seeing what percentage have had some level of engagement with that subject five or ten years later.
...teaching is fundamentally different than every other profession. Why is that?
The payoff of a good teacher shows up years down the road, not once per quarter. Teaching is extremely politicized; parents and random joes like to manipulate the system and are likewise manipulable by less scrupulous members of the public with a hidden agenda -- this is the "think of the children" phenomenon writ large. Teacher class assignments are politically motivated, not random, so student quality is not random. There are substantial movements by parents to eliminate standardized tests because they don't want their children labeled or profiled, and yet you want to increase testing to every quarter. The fears of those parents are somewhat justified because testing software keeps all kinds of behavioral metrics on students that are never revealed to parents. You have an insanely powerful union to deal with that doesn't necessarily represent the true interests of teachers.
One problem with standardized tests in my region is that teachers are not allowed to count them on grades, so students will just walk into the testing center, click "A" 150 times, and stare blankly for the rest of the time. That's hardly a reflection on teacher quality.
Student quality (and administrative or support staff manipulation of student-to-teacher assignments) varies a lot from teacher to teacher and year to year, probably much more than teacher quality. You talk about using quantitative measurements, but then treat variance like a qualitative thing that is equally ignorable regardless of scope.
>(Another example of left wing mathematics denialism applies to the pigeon hole principle. If you have N houses and K > N people, K - N people won't have a house.)
Can't win an argument because the facts aren't on your side? Bring up a red herring for a random insult! Why all the talk of left wing mathematics denialism? This is a perfect example of $GROUP_I_DISAGREE_WITH's typical meaningless sophistry!!
As much fun as the long thread now dangling beneath your post is, it also misses the point. The point is that all of what yummyfajitas posted can be backed up with peer-reviewed reliable research. As for the fact that many, most, or even all of them can also be contradicted by reliable peer-reviewed research, well, it may win the battle but lose the war to argue that. If science can produce peer-reviewed contradictory research... and it most assuredly can, because that's basically by design and broadly a good thing... then on what basis do you attack the public for the preference it shows for one side or another of the peer-reviewed research?
If I or the public in general agree with a position that only 30% of the "real scientists" in a field hold, am they "mistrusting science"? 10%? 5%? 1%?
That's actually an interesting question, I think. I certainly don't have a great answer as to how to draw the line. At least one can make a confident declaration at 0%. But the easy answers beyond that don't work, because there isn't an obvious algorithm that says how to convert collections of scientific papers into truth.
I myself have had some "big bets" out on major sciences being wrong for a while now, and at least one of them has metaphorically paid off; I've had a marker on "the dominant nutritional science of the day is majorly wrong" for about 13 years now. That would be the dominant nutritional science of the 1990s, with "excessive fat is the cause of obesity" being the primary claim, along with assorted other things. That the 1990s was wrong seems to be becoming the dominant position now at just about the maximum rate it can without anyone having to ever admit they were wrong. (Note that as a negation, I'm not claiming I did or do know what is the correct theory, just that the dominant one was wrong.) Was I "mistrusting science"? There were some scientists who led me in that direction, after all. If I was "mistrusting science", was that bad? Because it apparently wasn't very trustworthy anyhow. It is all-but-100% likely that some major fields of science today are just as wrong. What are they? I've got some thoughts but I can't prove them.
The correct relationship between the public and science is a legitimately difficult problem. It is certainly tempting for establishment (of whatever the appropriate establishment is) scientists to just yell at the public for not believing this or that element of their establishment position, but we must balance this against the fact that we have a looooot of history of establishment scientists being wrong, with at times quite significant consequences. Blind trust would be misplaced. Informed trust is an awfully tall bar. It's a hard problem.
> I myself have had some "big bets" out on major sciences being wrong for a while now, and at least one of them has metaphorically paid off;
What about a prediction market for science?
I think many insiders know some of the stuff is bunk but the public doesn't have the time or motivation to be able to make fine grained decisions about the validity of a particular line of research.
Prediction markets would make public policy decisions about investment in science more, ah, scientific perhaps.
My intuition is that 'good talkers' are getting funding over 'good walkers'. It is legitimately hard to generate informed trust and one wonders if it is even a worthwhile thing to make the attempt when advertising works so effectively. I would like to see the affects of applying a universal basic income just to scientists, because I think a reasonable chunk of the bullshiters would be filtered out.
A prediction market needs to have very concrete events that can be unambiguously paid out. I don't think we have one for science. If we did have such a metric for "when a scientist was correct", we could just use it directly. It isn't that useful for a prediction market to only work on multi-decade bets, plus one would have to consider the second-order effect of giving the original consensus position an even bigger monetary stake in maintaining dominance.
Am I doing this? FWIW though, I think it's generally a pretty big red flag when people make decisions without being familiar with the relevant research that exists.
I'm not aware of any ideology where "doctrinaire" belief does not require some level of self imposed blindness. Here's a few more, and I'll be equal opportunity:
- There is little evidence that free markets can drive serious innovation. Nearly all the high tech of the 20th century is a product of state research labs, state funding, or state enforced monopolies like Bell. This is true up to today e.g. Tesla and SpaceX are heavily state supported.
- There is a ton of evidence that IQ and temperament are heritable and that parenting styles (excluding extreme abuse) don't matter that much.
- Infinite exponential growth on a finite planet is impossible. Space migration may be possible but will not change this since e.g. Mars is too far from Earth and travel is too costly for fundamental physical reasons. Eventually supply side (ecological) or demand side (declining birth rates, diminishing marginal utility) will end the growth era. (Personally I think we are almost there as evidenced by near zero interest rates.)
- The entire war on drugs is built on very bad science, or no science, and evidence from cases like Portugal's heroin decriminalization show that harm reduction works better even for hard drugs.
- Most "alternative" medicine fares no better than placebo.
... and so on.
Evidence is not why people believe things. For the most part, Ideologies (tm) are social/tribal membership signals not attempts to actually and honestly understand the world.
Unfortunately we then proceed to reason from these shibboleths; to make decisions based on things that amount to little more than intellectual football t-shirts and special handshakes.
For a very long time I've seen this as humanity's greatest weakness.
I know you are just giving examples, but it is fairly difficult for me to take you seriously when none of those examples are linked to a respectable reference.
- Variance in female intelligence/math ability/other traits is smaller than for men, at roughly the rate Larry Summers famously speculated about.
As to the first study (Hyde et al. 2008), your claim only holds for white 11th grade Minnesota students, and only in terms of standardized math scores. For Asian-Pacific Islander students, the M/F variance ratio (VR) at the 95%+ and 99%+ percentiles is 1.09 and 0.91 respectively.
As to the fourth study (Kane and Mertz 2012), they don't report VRs for the upper tails of the distribution; they focus exclusively on grade school children; and they report that the overall VR varies widely from country to country, with a mean of 1.08. So again, your claim does not follow from their results.
I haven't read the other two studies, but given the pattern so far, I don't think I'll need to.
If you read all the studies, you'll generally find M/F variance ratio to be about 1.15. If you study sufficiently many subgroups you'll get a result which differs from this. But that's completely expected even if no subgroup differs at all.
Here's some discussion of that issue in a completely different context:
Not when the subgroups are large enough, and the observed differences are wide enough! (This is what degree-of-freedom corrections are for, anyway: if the effect persists despite the corrections, it holds with high probability.) I shouldn't even have to point this out...
Did you read the study you are talking about? For Asians, n=219. Second, the number you are citing isn't ratio of variance, it's ratio above the 99'th percentile. That ratio looks like 105 men and 114 women. So we've got a deviation from what's expected of about 18 people.
Note that the article didn't do any statistical test on this number (with or without a degree-of-freedom correction). That's unsurprising, given the small sample size.
tl;dr; the subgroups are not large enough and the observed differences are not wide enough. Feel free to run a statistical test to prove me wrong.
Oh I thought the SI had some kind of analysis.. A priori though, I do think the differences point at misspecification. Too busy/lazy to formalize this claim though, I admit..
> Different races within the US have different intelligence levels, and that intelligence is highly predictive of adult outcomes
I think tying this to race is a mistake. The data shows such correlations, but there are also correlations between race and socioeconomic status, and socioeconomic status and intelligence. The way you phrase this implies a premature conclusion about race when in reality it's likely not about race at all.
That's an entirely reasonable skeptical point of view to take in the absence of other evidence. However, this question has also been examined. Using typical measures of socioeconomic status (e.g. an index of education + income), one finds that controlling for socioeconomic status does little to explain the black-white IQ gap. There is a sizable gap at every level of socioeconomic status. Data: http://sites.biology.duke.edu/rausher/Hm2.jpg
It's still ridiculous to tie it to 'race' because, a) there is no real biological classification of race, which means all of this data using such classifications is already inherently biased, and b) mere correlation again does not entail a causation, so there could very well be a third variable that ties it all together. I can think of a bunch right off the top of my head, like stereotype threat and the still all too common racism resulting in many educational disadvantages for black students.
I would expect that enslaving a particular group would have an effect on how their genes get mixed up over time compared to a control group of non-slaves.
It's not a mistake to tie it to race. Race, or some hidden variable correlated to it, is highly predictive even after accounting for income and similar things.
Further, the effect size is large - far larger than anything you could hope to accomplish with a teacher.
Supposing that the cause is genetic, I'm not sure how it's different if the genetic influences were caused by slavery or something prior to it. Either way, the net result is that an identifiable group of humans will have an intelligence distribution shifted to the left.
Note that my claim had nothing to do with genetics - the genetic influences are trickier to pin down than the racial differences. Also, there is significant evidence that at least part of the black/asian gap is caused by factors other than racism and genetics - namely the overperformance (relative to black Americans) of black immigrants.
> Race, or some hidden variable correlated to it, is highly predictive even after accounting for income and similar things.
Since you're convinced of this position, please provide a biological definition of each "race" you think shows this effect, and we'll see if the data actually agrees with your claims.
Everything I said is perfectly valid if you take a sociological definition of race, so demanding I provide a biological definition of one is silly. The data I cite is based entirely on self-report - to the Census, to the California Dept of Education, and similar government bodies.
You'd know this if you actually read the sources I cited.
So yes, you are right. The study was not based on a biological measurement. We cannot rule out the possibility tyat at every income level stupid people with pale faces are more likely to self-identify as black while smart people with dusky skin are more likely to self-identify as white.
But there are a variety of reliable scientific results that contradict other ideologies. What would happen if they were included?
Examples:
- Variance in female intelligence/math ability/other traits is smaller than for men, at roughly the rate Larry Summers famously speculated about.
- Different races within the US have different intelligence levels, and that intelligence is highly predictive of adult outcomes.
- Demand curves for labor are downward sloping.
- In the nature vs nurture battle, nature won.
- Teacher performance can be reliably evaluated with statistics (e.g. VAM).
- There is no scientifically demonstrated benefit to organic food, and no harm from GMO.