I wrote a few direct responses, but they all felt patronizing. Apologies for the length, but I'm hoping to avoid that by showing my math, as it were:
About 7 years ago, I started to wonder about a cluster of concepts that I've yet to successfully explain to anyone without getting a confused look...
Roughly, it's a curiosity about what (if anything) is in the joints and gaps and shadows around the more salient features of our existence. An example is like: how (if at all) does having systems for getting rid of unpleasant things (human waste, trash, unwanted products, corpses, people who are disabled/invalid/elderly, criminals...) shape our thinking and understanding?
Privately, I think about this as unraveling: pulling on loose threads anywhere I catch myself or others taking something for granted.
A consequence of picking around in these gaps and cracks is that I've grown frustrated with most of the discourse, research, and subsequent popular-press reporting that has any overlap with a human/social system. The problem is pretty ubiquitous. The kindest diagnosis I can give is that there may be a big hole in our language(s)/grammar.
So, the methodology of the 2003 study (https://link.springer.com/content/pdf/10.3758%2FBF03196134.p...) cited first in the paragraph you link at Wikipedia documents its sample: "The participants were 60 men and 60 women who were each paid the equivalent of $4.50 U.S. Between 20 and 30 years of age (mean age, 22.2 years), all were right-handed students enrolled in either undergraduate or graduate programs (in social science, French, or administration) at the Université de Montréal."
Likewise, here's the first sentence of the discussion: "The present results revealed relatively high accuracy levels overall. Although men did not supply significantly more correct answers than did women across the three conditions, they needed, on average, less time to carry out the mental rotation of the structures illustrated in Vandenberg and Kuse’s (1978) items."
I want to draw a few threads out of this:
1. A more honest summary of the research is something like: "60 male and 60 female college students at a single university in Montreal, between the ages of 20 and 30, when studied in the early 2000s, representing 3 majors, selected by an unspecified strategy which involved compensation of $4.50 and then broken into 3 subgroups of 20 and given a similar mental rotation task in a different format, performed the mental rotations with similarly high levels of accuracy, though the selected male students groups did so around 33% faster." This is an improvement, but it still doesn't well-reflect how low our confidence still is that it measured anything accurate (let alone valid) about statistically indistinguishable groups ~now, let alone our roughly total inability to answer whether this relationship is fixed and durable across meaningful timescales.
2. There are reasonable alternatives to the headline treatment of the result. Since there was no statistically significant difference in the accuracy of their results--only the response time--the narrative about men's superior performance this task merits scrutiny. Is the difference explained by differences in confidence? Cautiousness? Familiarity with similar tasks? The study makes a halfhearted attempt to address cautiousness by asking for a self-report on whether they double-checked their answers, and on how difficult they thought the task was. Women consistently rated the problems as more difficult (but answered as accurately). Why did they rate it as more difficult?
3. The study discusses two main reported rotation strategies in the literature--holistic and analytic--and that the holistic strategies tend to be faster than the analytic strategies. It also notes that "The literature thus suggests that, in the standard visual presentation condition, the holistic strategies preferred by men are more efficient than the analytic strategies preferred by women." I haven't read the whole study, but I don't see any attempt to tease out to what extent response time differences persist or disappear when controlling for strategy. I see no discussion of or attempt to investigate why men and women are using different strategies. But, the fact that the primary difference is response time, and a response-time difference is noted between strategies, and strategy differences are noted between the genders, and how cultural/experiential and non-intrinsic strategy selection appears as a variable, I'm a bit slack-jawed that this isn't the headline result.
4. I see no examination or breakdown of demographics/activities that might matter. Keep in mind that while they covered 120 students in total, each individual test group is 20 students with no discussion of how they were selected. These groups are small enough to be vulnerable to weird samples even if they did draw statistically random samples. How many of these students played sports? Which one? How recently? At what level? What about adjacent recreational activities they might not consider a sport, such as shooting, skating, skateboarding, bowling, rock-climbing, kayaking, mountain-biking, yoga, running, etc. How many practice some sort of art or craft? Do they play video games? Pilot any sort of vehicle? Does controlling for any of these affect the observed gender difference?
Moving beyond the study itself:
5. If the response-time difference is durable and intrinsic, it's not obvious what if any bearing a response-time difference the measured task has on how well men and women can perform an engineering job. They might not even be meaningfully correlated. Is an engineer's performance strongly bottlenecked by how quickly they rotate objects, or is this delta largely leveled in practice by other factors (e.g., my typing speed is rarely the thing that limits how much code I write in a day).
6. Even if such a speed difference does have a bearing on how well men and women perform an engineering job, there's no basis for assuming the magnitude of the speed performance difference would equal the magnitude of the job performance difference (which could be larger, smaller).
The Wikipedia article picks one of many pieces of research to quote. Your criticism are fair as a criticism of that specific research paper. But after you've looked at a dozen of them, continuing to say that it is just an artifact of terrible methodology (even though most actually are terrible) becomes much harder to support.
But that said, here is a classic test showing strong gender differences accidentally discovered by Piaget. Draw two cups, one tilted. Assuming that both are halfway full of water, ask the subject to draw the water line.
Until puberty, neither gender can perform this task well. After puberty around 90% of men and 30% of women find it trivial, while the rest do not. (The most common answer that I have seen from women is to tilt the water line with the cup. The second is to tilt it double!) Performance on this task is not particularly correlated with education.
Ask a few friends and family, and it isn't hard to verify for yourself that a gender gap exists.
Interestingly I have yet to encounter a programmer, male or female, that doesn't find it trivial. I have no idea why programming would select for people who find this task easy. but it apparently does. And the population of people who find this task straightforward has 3x as many men in it as women.
Thank you for taking the time to read this; it took me long enough to write that I was afraid it would be for naught.
I think you are reading me more narrowly than I want to be read, but I don't fault you there. I am not good at communicating this concept yet.
I picked the first study because I start at the beginning, and look for loose threads.
I didn't pick on this study because I think its methodology is terrible. I just picked on the first few loose threads I found, and pulled.
I pulled on the threads to (try to) model and promote how I go about searching for things that aren't facially obvious, and to demonstrate how applying this process (showing my math) leads me inexorably to one conclusion: it is very hard to think and communicate about things like this in a sound, lucid manner.
I'm not here to poke holes in individual studies or "prove" anything, but to encourage you to look for loose threads in the epistemic fabric, here.
In closing, and to be more specific: listing a bunch of issues with the study doesn't make it wrong. Proving the study wrong is not the point. The issues highlight some of the epistemic hubris in the study, and hopefully to expose how that sort of hubris can compound as we summarize work like this and weave it into higher-level narratives. It may help to see this as laundering unearned/unexamined assumptions at lower levels.
Last fall, my father asked what I thought about a video on YouTube that took a passage from the Book of Revelation, upcoming astral events from Stellarium, and interpreted them into an end-days prediction.
I said that it held about as much water as predictions based on numerology.
And then he asked what I would think if the astral event happened when and as described in the video.
I said I had no reason to assume Stellarium was wrong, but that the fact of some arrangement of astral bodies doesn't inherently validate any interpretation that the arrangement is significant, nor any subsequent interpretation of what its significance is.
Hormonal differences are known to cause a host of physical differences between the genders, a host of behavioral differences (have you read the research on what testosterone does?), and a tremendous amount of mental differences in other species. But political correctness causes us to reject out of hand the idea that biology could cause modest differences in average mental abilities and interests in homo sapiens.
There is no question that a few decades ago people casually accepted conclusions about gender differences that we today recognize are fallacious. But the current consensus is an absurd overreaction.
It feels like you're lobbing points from trench to trench. I am not in either trench, and wouldn't like to be (there are weird things in both trenches), so I'm doing what I can to avoid the established battle lines.
I am not trying to debate or argue or win, so there is no satisfying crunch. I am just hoping to say a few things you can chew on (without being too much of a jerk along the way).
If I were presented with the task, I would first draw a tilted water line to figure out what the "halfway filled" amount looks like. Then I would "undo" the tilt, taking care to keep the represented amount of water the same. It could be that women are simply being more methodical in their approach.
Try it with actual women. Their reactions make it clear that they recognize the right answer when presented but have no way of calling up “water is horizontal” from the problem itself.
While I think your criticism about the example given is good, you're still only addressing the specific example given. The parent poster was trying to illustrate why gender parity might not actually be the ultimate goal. The example might have problems, but that doesn't necessarily invalidate the point that was made.
Oh, for sure. I just took the first study. And it's a good example of how big the leaps are in our thinking and communication around these topics.
It's possible the others are all stellar, but my experience says it's not likely. Because we aren't very good at thinking.
Lucid, complete thinking about even fairly simple things--as I imagine most if not all successful programmers have learned--is difficult.
Complex adaptive social systems, which can have the effects of events that happened thousands of years ago still rippling through them, have deep wells of unknown state and dynamics.
The only honest way to reason about them is with epistemic humility.
About 7 years ago, I started to wonder about a cluster of concepts that I've yet to successfully explain to anyone without getting a confused look...
Roughly, it's a curiosity about what (if anything) is in the joints and gaps and shadows around the more salient features of our existence. An example is like: how (if at all) does having systems for getting rid of unpleasant things (human waste, trash, unwanted products, corpses, people who are disabled/invalid/elderly, criminals...) shape our thinking and understanding?
Privately, I think about this as unraveling: pulling on loose threads anywhere I catch myself or others taking something for granted.
A consequence of picking around in these gaps and cracks is that I've grown frustrated with most of the discourse, research, and subsequent popular-press reporting that has any overlap with a human/social system. The problem is pretty ubiquitous. The kindest diagnosis I can give is that there may be a big hole in our language(s)/grammar.
So, the methodology of the 2003 study (https://link.springer.com/content/pdf/10.3758%2FBF03196134.p...) cited first in the paragraph you link at Wikipedia documents its sample: "The participants were 60 men and 60 women who were each paid the equivalent of $4.50 U.S. Between 20 and 30 years of age (mean age, 22.2 years), all were right-handed students enrolled in either undergraduate or graduate programs (in social science, French, or administration) at the Université de Montréal."
Likewise, here's the first sentence of the discussion: "The present results revealed relatively high accuracy levels overall. Although men did not supply significantly more correct answers than did women across the three conditions, they needed, on average, less time to carry out the mental rotation of the structures illustrated in Vandenberg and Kuse’s (1978) items."
I want to draw a few threads out of this:
1. A more honest summary of the research is something like: "60 male and 60 female college students at a single university in Montreal, between the ages of 20 and 30, when studied in the early 2000s, representing 3 majors, selected by an unspecified strategy which involved compensation of $4.50 and then broken into 3 subgroups of 20 and given a similar mental rotation task in a different format, performed the mental rotations with similarly high levels of accuracy, though the selected male students groups did so around 33% faster." This is an improvement, but it still doesn't well-reflect how low our confidence still is that it measured anything accurate (let alone valid) about statistically indistinguishable groups ~now, let alone our roughly total inability to answer whether this relationship is fixed and durable across meaningful timescales.
2. There are reasonable alternatives to the headline treatment of the result. Since there was no statistically significant difference in the accuracy of their results--only the response time--the narrative about men's superior performance this task merits scrutiny. Is the difference explained by differences in confidence? Cautiousness? Familiarity with similar tasks? The study makes a halfhearted attempt to address cautiousness by asking for a self-report on whether they double-checked their answers, and on how difficult they thought the task was. Women consistently rated the problems as more difficult (but answered as accurately). Why did they rate it as more difficult?
3. The study discusses two main reported rotation strategies in the literature--holistic and analytic--and that the holistic strategies tend to be faster than the analytic strategies. It also notes that "The literature thus suggests that, in the standard visual presentation condition, the holistic strategies preferred by men are more efficient than the analytic strategies preferred by women." I haven't read the whole study, but I don't see any attempt to tease out to what extent response time differences persist or disappear when controlling for strategy. I see no discussion of or attempt to investigate why men and women are using different strategies. But, the fact that the primary difference is response time, and a response-time difference is noted between strategies, and strategy differences are noted between the genders, and how cultural/experiential and non-intrinsic strategy selection appears as a variable, I'm a bit slack-jawed that this isn't the headline result.
4. I see no examination or breakdown of demographics/activities that might matter. Keep in mind that while they covered 120 students in total, each individual test group is 20 students with no discussion of how they were selected. These groups are small enough to be vulnerable to weird samples even if they did draw statistically random samples. How many of these students played sports? Which one? How recently? At what level? What about adjacent recreational activities they might not consider a sport, such as shooting, skating, skateboarding, bowling, rock-climbing, kayaking, mountain-biking, yoga, running, etc. How many practice some sort of art or craft? Do they play video games? Pilot any sort of vehicle? Does controlling for any of these affect the observed gender difference?
Moving beyond the study itself:
5. If the response-time difference is durable and intrinsic, it's not obvious what if any bearing a response-time difference the measured task has on how well men and women can perform an engineering job. They might not even be meaningfully correlated. Is an engineer's performance strongly bottlenecked by how quickly they rotate objects, or is this delta largely leveled in practice by other factors (e.g., my typing speed is rarely the thing that limits how much code I write in a day).
6. Even if such a speed difference does have a bearing on how well men and women perform an engineering job, there's no basis for assuming the magnitude of the speed performance difference would equal the magnitude of the job performance difference (which could be larger, smaller).